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SIMILARITIES BETWEEN HEARING AND SKIN 
SENSATIONS’ 


GEORG v. BEKESY 


Harvard University 


For many years there has been a 
large literature concerned with problems 
in hearing. Anatomy, psychology, 
physiology, otology, and_ recently 
audiometry and communication en- 
gineering: all these fields have con- 
tributed to the investigation of hearing. 
It is not so with the psychology and 
physiology of the skin. Most text- 
books have but few pages devoted to the 
skin. 

One of the reasons for this may be 
that the too precise description of the 
skin sensations by the Greek and 
Roman authors seemed improper to 
the Mohammedan scribes, who with 
endless zeal copied the earlier manu- 
scripts. The scribes therefore simply 
left out the parts dealing with the 
skin. When in medieval times monks 
took over the job of copying manu- 
scripts in the cloisters, it is even more 
understandable that the skin was not 
even mentioned as a sense organ. To 


1 Paper presented at the Philadelphia 
meeting of the Eastern Psychological As- 
sociation on April 11, 1958. This research 
was carried out under Contract Nonr- 
1866(15) between Harvard University and 
the Office of Naval Research, Project 
Nr142-201, Report PNR-218. Reproduction 
for any purpose of the U. S. Government 
is permitted. 


a certain degree this attitude held on 
until late in the eighteenth century. 

Another reason why little progress 
has been made in the investigation of 
the skin is the fact that the skin is one 
of the most complicated of the sense 
organs. It is sensitive to continuous 
pressure, to vibrations, to electrical 
stimuli, to heat, cold, and chemical 
stimuli. Several receptors are closely 
imbedded in the skin. But at the 
same time, in spite of its high sensi- 
tivity, the skin withstands rough treat- 
ment surprisingly well. In contrast to 
this, the ear in man is located in one 
of the hardest bones, the petrous bone. 
But the ear still seems to me to be 
easier to investigate, since the various 
nerve endings along the basilar mem- 
brane are well differentiated, as is 
the whole anatomical outline. 

There are experiments, however, that 
cannot be performed on the, inner 
ear, and for the solution of' some 
problems the similarity between hearing 
and the skin sensations can be of real 
value. The purpose of this paper is 
primarily to point out some of the 
similarities between these two sense 
organs, and at the same time to indicate 
the borderline up to which these 
similarities hold. The many similarities 
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that exist between the senses of hearing 
and vision have led to the hope that in 
time there may evolve a unified con- 
cept of these apparently so different 
sense organs. 

The recent literature on hearing is 
readily available. For one of the best 
summaries of the sense of vibrations, 
I should like to point out the mono- 
graph of W. D. Keidel (1956). My 
interest in skin sensations started about 
30 years ago (v. Békésy, 1930a), at 
which time I was concerned about the 
general validity of Fechner’s law. In 
order to test it, I was looking for a 
sensation whose magnitude could be 
indicated or described. One sensation 
that met this criterion was directional 
hearing, in which the observer can 
simply point to the sound source. We 
carried out an experiment in which 
an earphone presented a click to each 
ear. The stimulus was the time 
difference between the two clicks. As 
the time difference was increased from 
zero, the sound image traveled from 
the middle line of the head to one 
side, and it was easy for the observer 
to point to the direction from which 
the sound seemed to come. The angle 
between the middle line and the 
direction indicated was taken as the 
measure of the sensation magnitude. 
As the loudness of the clicks is in- 
creased, the source seems to come 
closer and closer to the head, so that 
at a certain loudness level the clicks 
seem to hammer right against the skin 
of the forehead and to move across the 
forehead in accordance with the time 
difference. The movement of the 
sound image around the head is shown 
in Fig. 1, for two different observers. 
The observer could point with the tip 
of a pencil at the place on the skin 
where the sharp clicks were observed. 
To make the pointing easier, the device 
shown in Fig. 2 was used. It consisted 
of a protractor fixed on the headset 





Fic. 1. Each ear is presented with a 
click through an earphone. By changing the 
time delay between the two clicks, the 


sound image of the fused click can be made 
to travel around the head. At a particular 
loudness, the sound image seems to move 
along the surface of the skin of the forehead. 


of the earphones. Around the axis of 
the protractor, a tube could be rotated 
which had an opening on the side 
opposite the forehead. The tube was 
closed on the lower side, but sharp 
air pulses were transmitted to the upper 
side. These pulses were produced by 
the discharges of a condenser through 
an electrodynamic loudspeaker system. 
The magnitude of the discharges and 
their time pattern were modified so 
that the skin sensation matched as 
closely as possible the sensations pro- 
duced by the acoustic clicks. The 
surprising feature of this experiment 
was that it was possible to make the 
match so good that the observer could 
hardly discriminate between the air 
puffs on the skin and the clicks in the 
earphone when both seemed to come 
from the same direction. This cer- 
tainly suggests that under some con- 
ditions the sensations produced on the 
skin and the sensations of hearing can 
be very similar. 

Of course, there are differences in 
the physical aspects of these stimuli, 
and besides this there are neurological 
differences in the sense organs. We 
should not forget that there are at 
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least six different ways in which hear- 
ing can occur: air-borne sound, me- 
chanical vibrations touching the skull 
(bone conduction), electrical stimula- 
tion of the ear, electrical stimulation of 
the acoustic nerve, electrical stimulation 
of the cortex, and hearing without any 
stimulation (tinnitus). 

In addition to the six or more ways 
in which the ear can be stimulated, 
we have to take into consideration that 
every type of stimulus produces a 





Fic. 2. As the correct position of the 
sound image on the forehead was localized, 
an air puff was blown against the surface of 
the skin from a small opening in a vertical 
tube. The tube was moved around the head 
in such way that it coincided with the 
place where the sound image of the acoustic 
click was localized. The time pattern of the 
air puffs could be so adjusted, that most 
observers could hardly separate them from 
the clicks, which demonstrates that sensa- 
tions produced on the skin are, under some 
circumstances, very similar to the sensations 
produced by hearing. 


sensation with at least eight different 
qualities: pitch, loudness, volume, 
roughness (modulation, tremolo), di- 
rection, distance, on-and-off effects 
(important for the discrimination of 
speech and music), and rhythm. All 
these qualities in hearing have their 
counterparts in the skin sensations. 
The next sections will deal first with 
the analogues and differences for the 
physical stimuli, and then with the 
neurological effects. 


THE PATTERN OF THE STIMULUS 
ALONG THE BASILAR MEMBRANE 
AND ALONG THE SURFACE OF 
THE SKIN 


There have been many theories about 
the vibration pattern of the basilar 
membrane when it is stimulated by a 
pure tone. The theory of traveling 
waves along the membrane met a good 
deal of opposition because it seemed 
impossible that a traveling wave mov- 
ing along a large section of the basilar 
membrane could produce a sensation 
that was concentrated on only a very 
small section of it, an assumption that 
is necessary to explain the frequency 
analysis in the cochlea. On the other 
hand, it seems obvious that a vibrating 
needle touching the surface of the skin 
will produce a deformation only in the 
immediate vicinity of the needle. This 
has been assumed to be the reason the 
vibrations of the needle are felt only 
immediately under the tip of the needle. 
However, a simple stroboscopic obser- 
vation of the skin surface under a 
vibrating needle (Fig. 3) shows travel- 
ing waves spreading out from the 
needle point in every direction and 
forming more or less concentric rings 
(Keidel, 1956). These waves have 
little damping, and their wave length 
decreases with the increasing frequency 
of the vibrator. If the middle of the 
arm is touched with a vibrator tip 
5 mm. in diameter at frequencies below 
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Fic. 3. Traveling waves on the skin 
observed stroboscopically for different fre- 
quencies produced by a vibrator touching 
the surface of the skin. 


50 cps and at proper amplitude, the 
traveling waves may go around the 
whole arm, and in spite of this the 
vibration is felt only immediately under 
the tip. A person standing on a 


platform that is vibrating at 50 cps 
may feel the vibrations penetrate about 


1 cm. deep into his feet and have no 
awareness of the traveling waves on 
the surface of the skin along his leg, 
which can be seen under stroboscopic 
illumination (v. Békésy : 1939a, 1939b, 
1940). Thus it is clear that a large 
area under vibration can produce a 
sensation that is limited to a very small 
spot. 

The ear too has traveling waves. 
Figure 4 shows a schematic drawing 
of the ear. The sound waves hit the 
eardrum on the left side and set it 
into vibration. These vibrations are 
transmitted to the ossicles and the 
stapes. The stapes footplate produces 
fluid displacements in the scala vestibuli 
of the inner ear, and these fluid 
displacements set the basilar membrane 
into vibration. The actual vibration 
patterns on the basilar membrane have 
been measured on the ear of a live 
guinea pig and in a preparation of 
human temporal bone during stimula- 
tion with a pure tone. The vibration 
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Fic. 4. Schematic cross-section of the 
ear, showing the basilar membrane in the 
inner ear. Movements of the eardrum pro- 
duce traveling waves on the basilar mem- 
brane similar to those on the skin. 


patterns obtained are shown in Fig. 5 
(v. Békésy, 1947a). The curves 
represent traveling waves spreading 
from the stapes footplate along the 
basilar membrane. The traveling waves 
show maximum vibration amplitude 
at a certain place, and they have a high 
speed near the stapes and then slow 
down. The place of maximal vibration 
amplitude changes with the frequency 
of the vibrations. It is close to the 
stapes for high frequencies and farther 
away (to the right in Fig. 5) for the 
lower frequencies. The ears of all 
mammals show the same type of travel- 
ing waves. Even the ear of the frog, 
which is constructed differently, has 
been shown by W. A. van Bergeijk 
(1957) to have traveling waves 
present, as indicated in Fig. 6. 





24 26 
Distance trom stapes in millimeters 
Fic. 5. Traveling wave on the basilar 
membrane for a tone of 200 cps observed 
stroboscopically on a preparation of human 
temporal bone for two moments in time 
separated by a quarter period. 
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Fic. 6. Inner ear of the frog, showing a 
membrane with a central mass. From both 
ends of the U-shaped membrane traveling 
waves move toward the middle, just as in 
the inner ears of mammals (after W. A. 
van Bergeijk). 


Here, then, is an instance of similarity 
between vibration patterns on the skin 
and on the basilar membrane, for they 
both show traveling waves. As a 
matter of fact, any living tissue or 


material of the same consistency shows 
traveling waves if its surface is touched 


by a vibrating body. But there are 
differences between the traveling waves 
on the skin and those on the basilar 
membrane. On the skin, as is obvious 
from Fig. 3, the maximum of the 
vibration amplitudes is always directly 
under the tip of the vibrator, and this 
is the reason why the sensation is 
localized mainly under the vibrator. 
On the basilar membrane, however, the 
maximum vibration amplitude changes 
its place along the membrane with 
frequency, the high frequencies being 
localized near the stapes and the low 
frequencies at the other end. By this 
means, a kind of rough mechanical 
frequency analysis is performed along 
the basilar membrane. 

Since in both the auditory and the 
tactile senses the physical stimulus 
sets up systems of traveling waves, 
we may assume that these mechanical 
patterns are integrated in the nervous 


system of the ear and the skin in 
similar ways. 


NEUROLOGICAL SIMILARITIES AND 
DIFFERENCES BETWEEN THE EAR 
AND THE SKIN 


The increase in loudness that ac- 
companies an increase in sound 
pressure offered a particularly good 
area for comparative experiments. For 
comparison of the loudness of a pure 
tone with the sensation magnitude of 
sinusoidal vibrations presented to the 
finger tip, we used the equipment 
shown in Fig. 7. It consisted of a 500- 
cps oscillator which was periodically 
and automatically switched to one of 
two attenuators. The output of the 
lower attenuator was attached to an 
earphone, whereas the upper attenuator 
was connected through a Wheatstone 
bridge to an electrodynamic driving 
unit. The Wheatstone bridge was 
used to superimpose onto the alter- 
nating currents a direct current. This 
direct current acted on the coil of the 
electromagnetic system and pushed the 
coil with the vibrating tip always up- 
wards. The d.c. pressure of the vi- 
brator on the finger tip was determined 
by the magnitude of the direct current. 
The observer’s hand and the rest of 
his finger rested on a platform P. 


Fic. 7. Equipment to compare the loud- 
ness of a tone with the sensation magnitude 
of a vibration. Onto the alternating cur- 
rents of the electrodynamic vibrator a 
direct current is superimposed through a 
Wheatstone bridge. The direct current 
produces a constant pressure of the vibrator 
tip on the finger. 





6 Grorc v. BEKEsy 


First the auditory threshold was de- 
termined and then the threshold for 
the vibrations. It turned out that, in 
order for the loudness of the 500-cps 
tone to match the sensation intensity 
of the vibrations, both attenuator boxes 
had to be changed by the same amount. 
This experiment, carried out 25 years 
ago (v. Békésy: 1930a, 1930b), 
siimulated me to investigate further the 
analogies between hearing and vibra- 
tion sensations. Later (v. Békésy: 
1955, 1958), it turned out that the 
close similarity between loudness and 
the sensation of vibration magnitude 
holds only for the very sensitive parts 
of the skin, like the finger tip, but not 
for the less sensitive skin of the upper 
arm. The sensation magnitude on the 
upper arm increases much faster from 
threshold than does loudness. Since 
a partially anesthetized finger tip be- 
haves the same way compared to a 
normal finger tip, it seems likely that. 
when the number of functioning end 
organs is small, sensation magnitude 
increases sharply as a function of the 
stimulus. For areas on the skin with 
high sensitivity, however, it is sur- 
prising how great the similarity be- 
tween loudness and vibration sensa- 
tions is. 

The similarity of the two sense 
organs reaches its maximum when 
short clicks are used as stimuli. With 
sinusoidal vibrations the similarity is 
no longer so general. This may be 
because the end organs of the skin 
seem to react slower than the end 
organs of the ear. All the phenomena 
in which time patterns are involved 
can be discriminated by the skin only 
if the changes are slow relative to 
changes that the ear can recognize. 
For instance. if we suddenly switch a 
sinusoidal vibration onto the finger 
tip, the sensation magnitude does not 
increase sharply but takes more than 
a second to become fully established 
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Fic. 8. Increase with time of the sensa- 
tion magnitude of a vibration of 100 cps 
suddenly applied to the finger tip. 





14sec. 


(v. Békésy: 1939a, 1939b, 1940). 
This is shown in Fig. 8, where the 
ordinate represents the increase in the 
magnitude of the sensation. The curve 
was obtained by presenting the vibra- 
tion for short times, such as 0.3, 0.6, 
0.8, and 1.0 sec., and increasing the 
amplitudes of these impulses until the 
sensation magnitude matched that of a 
vibration presented for 2 sec. The 
ordinate shows the increase in the 
amplitudes required for the short im- 
pulses to match the 2-sec. presentation. 
In hearing it also takes time for the 
loudness to develop to its full magni- 
tude (v. Békésy, 1929), but the time 
required is only about 0.2 sec., about 
one-fifth the time required by vibratory 
sensations, as can be seen in Fig. 9. 
If the sound is suddenly switched off, 
again it takes a certain length of time 
until the sensation vanishes completely, 
and here again the decay time is much 
longer for the skin than for hearing. 


db 
0] 
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Fic. 9. Increase in the loudness of a 
1,000-cps tone switched on suddenly. 
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Fic. 10. Threshold curve for the finger tip 
at different vibration frequencies. 


Since the nervous system of the 
skin works more slowly than the 
nervous system of the ear, we would 
expect that the nerves of the skin 
would not be able to follow the rhythm 
of vibrations at as high speeds as can 
the ear. Consequently, we would 
expect the threshold for vibration 
amplitudes to start to rise at very low 
frequencies with about the same slope 
as the auditory threshold rises between 
10,000 and 20,000 cps. Figure 10 


shows that the threshold on the finger 
tip shows its maximal sensitivity in the 
frequency range between 300 and 400 


cps and rises very rapidly at 1000 cps. 
But it should be mentioned that the 
rise in the threshold with increasing 
frequency involves not only neural! 
phenomena, but also some mechanical 
factors, such as the transmission of the 
vibrations to the end organ. 

In short, we can say that the nerves 
in the skin act more slowly than the 
auditory nerves, and, to obtain com- 
parable effects, transients presented to 
the skin must be 5 to 10 times slower 
than those presented to the ear. On 
the other hand, the growth of sensation 
intensity on the finger tip is much like 
the growth of loudness in hearing. 
Why this does not also hold for the 
upper arm is of particular interest, 
since many factors indicate that the 
sensation intensity is connected with 
the density of the neural innervation 
in the skin at the point in question. 
A single end organ with its nerve fiber 
generally behaves almost in accordance 
with the all-or-none law, and _ this 


simple nervous system will not react 
below its threshold, but will react 
strongly above it. But if there is a 
large number of end organs, their 
sensitivity will vary, and the loud- 
ness increase from threshold will no 
longer consist of a single jump, but 
will spread over a larger interval. The 
smallest distance apart at which two 
vibrating points can be discriminated 
as two separate impressions is very 
much smaller on the finger tip than 
on the arm. This seems to indicate 
that the density of the end organs is 
higher on the finger tip than on the 
arm, and therefore a given vibrating 
tip will stimulate more end organs on 
the finger tip than on the arm. The 
larger number of end organs makes the 
apparent intensity of the vibration in- 
crease more slowly on the finger tip. 
We can get the same effect on the arm 
if we compare the increase in apparent 
intensity of a sharp tip with that of 
a large frame that stimulates a large 
number of end organs. Such a match 


150 cps 


vibration amplitudes of the frome 





30 40 db 
amplitudes of the point 
Fic. 11. The abscissa represents the vi- 
bration amplitude above threshold of a 
point vibrator (diameter 1 mm.); the 
ordinate, the vibration amplitude of a vibrat- 
ing frame that seemed equal in loudness 
to the vibration sensation of the point. The 
curves show that near threshold the same 
change in amplitude results in a larger 
change in sensation intensity for the point 
than for the frame. 
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in subjective intensity is shown in Fig. 
11, for three observers. It can be seen 
that, starting from threshold, an in- 
crease of 20 db in the vibration ampli- 
tude of the frame has the same effect 
on subjective intensity as an increase 
of 5 db in the vibration amplitude of 
the needle (v. Békésy: 1955, 1958). 

A faster and more accurate measure- 
ment of the relative loudness increase 
near threshold can be obtained with 
an automatic audiometer (v. Békésy, 
1947b). This audiometer consists of 
a potentiometer which is driven by a 
motor in such a way that pressing a 
button causes the sound pressure to 
increase continuously, whereas releasing 
the button causes the pressure to de- 
crease continuously. Starting at a 
low value, the observer holds the 
button down until he hears the sound 
and then releases it until the sound 
is no longer noticeable, and so on. 
The fluctuations of the potentiometer 
position give to some degree a measure 
of the difference limen in decibels for 
amplitude changes at threshold. The 
same technique was used on the skin, 
with an electrodynamic vibrator re- 
placing the earphone. Figure 12 shows 
fluctuations in the threshold for vibra- 
tions at 150 cps, when the lower arm 
is stimulated by a frame (left side of 


150 cps 


frame needle 





11 time 
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Fic. 12. The just noticeable amplitude 
variations may be much smaller when the 
stimulator is a needle tip than when it is 
a frame. 
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figure) and by a needle tip (right 
side). A comparison of the two records 
shows that the just noticeable differ- 
ences in apparent intensity require a 
much larger amplitude change with 
the frame than with the needle. At 
the same time, as a consequence of the 
smaller area stimulated, the thresholds 
obtained with the needle are about 8 
db less sensitive than those obtained 
with the frame. A consequence of 
a large nerve supply seems to be that 
the sense organ is more sensitive. 
Experiments have shown that nar- 
cotizing the skin can cut down the 
active nerve supply to such a degree 
that records taken with the frame 
have the same form (shown on the 
right of Fig. 12) as those taken with 
the needle on normal skin. 

The 35-mm. length of the human 
basilar membrane contains a nerve 
supply of between 30,000 and 40,000 
nerves. The nerves are much denser 
than on the finger tip; consequently, 
even at high frequencies, where the 
stimulation is concentrated near the 
stapes footplate, great numbers of end 
organs are stimulated. But when there 
is nerve deafness, some of the nerves 
decrease their activity and, as can be 
seen in the lower curve of Fig. 13, 
the difference limen for sound pressure 
changes decreases for high-frequency 
tones which stimulate a small area on 
the basilar membrane. At the same 
time the sensitivity for the high tones 
deteriorates and the threshold goes 
down rapidly, much as was found for 
the skin in Fig. 12. In contrast to 
the curve for nerve deafness, the upper 
drawing of Fig. 13 shows the hearing 
threshold where there is a purely 
mechanical middle-ear disturbance. 
Here again, the threshold goes down 
at the high frequencies, but the differ- 
ence limen amplitude changes stay 
about the same. 

Of the many analogues in hearing 
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Fic. 13. The upper curve shows an audio- 
gram, recorded with an automatic audiom- 
eter, of a person with conductive hearing 
loss. This is a purely. mechanical hearing 
loss, produced by the transmission loss 
of the vibrations in the middle ear. The 
fluctuations in the audiogram give a measure 
of the just noticeable sound pressure changes. 
In mechanical hearing loss the difference 
limen is independent of frequency. The 
lower figure shows that in nerve deafness 
(caused by shooting) the difference limen 
decreases rapidly with frequency in about 
the same way as the difference limen 
measured on the skin when the stimulated 
area is decreased in size from a large sur- 
face to a sharp point. 


and skin sensations, I should like to 
point out one more, namely, the ap- 
parent size of the sensation. It is well 
known that every sound transmitted 
to the ear through an earphone, a tube, 
or a loudspeaker seems to have a 
certain volume which is determined 
by its loudness, pitch, and time pattern. 
A hum of 50 cps may seem to occupy 
a sphere several feet in diameter, 
whereas a sharp noise of the same 
loudness seems to extend only a few 
inches. Exactly the same holds for the 
skin sensations, which may change 
their apparent size even when the 
surface area stimulated by the vibrator 
remains the same. Figure 14 shows— 
for hearing, for mechanical stimulation 
of the arm, and for electrical stimula- 
tion of the finger tip—how the apparent 
size of the sensation for sinusoidal 
stimulation decreases as the frequency 
increases. The slopes of these curves 
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are all very similar. The effect seems 
to be connected with the increase 
in lateral inhibition for fast-occurring 
changes in the stimulus, 


Tue USE oF THE ANALOGY BETWEEN 
HEARING AND SKIN SENSATIONS 
FOR RESEARCH IN HEARING 


My main research field for many 
years has been hearing, but I have been 
drawn to investigate skin sensations 
for several reasons. In earlier times, 
theories about the vibration pattern 
of the cochlear partition and the basilar 
membrane were usually based on 
psychological observations. Since the 
number of variables is very large, no 
definitive theories evolved. Through 
direct observations in the ear of the 
living guinea pig and on preparations 
of human temporal bone, we have been 
able to make measurements on the 
vibration pattern along the basilar 
membrane. But in order for our 
conception to be complete, it seemed 
necessary to find out whether the 
vibration pattern of the traveling waves 
resulted in sensations that could ac- 
count for the discriminating powers of 


yarm (mechanical) 


tones 
(hearing) 


finger tip 
(electrical) 


opporent size of the sensation for equal loudness 
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vibrations and electrical stimulation 


Kc. 14. As the frequency increases, the 
apparent size of sensations decreases with 
about the same slope for tones, mechanical 
vibrations, and electrical stimulation of the 
skin. 





Fic. 15. Model of the cochlea. On the 
top of the cylinder can be seen the mem- 
brane with the rim. On the right is a 
metal bellows with a driving rod which 
serves as stapes footplate. Two tubes make 
it possible to fill the whole model with 
water. The length of the model tube is 
30 cm. With the arm resting on the rim 
of the membrane, localization of the vibra- 
tions can be observed, even through heavy 
garments. 


the inner ear, such as pitch discrimina- 
tion, etc. It is much the same approach 
as is used in chemistry, where, after 
an analysis has been completed, for 
the final proof a synthesis is carried 
out. 

For the synthesis, models of the 
cochlea were constructed that showed 
much the same wave patterns as those 
observed in the human cochlea (v. 
Békésy: 1955, 1958). The models 
were so-called dimensional models, of 
the type used in ship building. Several 
were made, the last one a modification 
of a model made at Gottingen (Diestel, 
1954). This model consists of a plastic 
tube which has along its upper side an 
edge that vibrates in the same way as 
the basilar membrane and shows the 
same traveling waves, over a frequency 
range of two octaves. It is shown in 
Fig. 15 with a metal bellows which was 
used to put in motion the fluid inside 
the tube. 

The important problem was to pro- 
vide this mechanical model with an 
appropriate nerve supply. First we 
tried to lay a frog skin above the 
vibrating part of the model, and to 
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record the nerve spikes. Later we 
placed the skin of our own arm on 
the model, as can be seen in Fig. 15, 
and observed what could be felt. Im- 
mediately it was clear that, although 
the membrane was vibrating along a 
large section, only a very well defined 
narrow section seemed to produce a 
stimulus. This situation is demon- 
strated schematically in Fig. 16. The 
upper drawing shows the increase of 
the amplitudes at different points along 
the membrane when the “stapes foot- 
plate” is suddenly set into vibrations of 
constant amplitude. There is a place 
of maximal amplitude but the maxi- 
mum is very flat. The sensation, on 
the other hand, is very well defined, 
as illustrated in the lower drawing. 
All the lateral widespread vibrations 
on both sides of the maximum seem 
to be inhibited in the skin and are not 
felt. 

As the frequency is changed, the 


stimulus 
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Fic. 16. Inhibition on the skin. The 
upper drawing shows the development of 
the vibrations along the membrane during 
the onset of a coritinuous tone of 125 cps. 
The lower drawing shows how widely these 
vibrations are suppressed by the skin, so 
that only a small section of the membrane 
is felt as vibrating. 
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place of maximal stimulation moves 
along the membrane. For high fre- 
quencies the maximum approaches the 
“stapes footplate” end of the mem- 
brane, and for low frequencies it moves 
toward the other end, just as in the 
normal ear. This produces a sensa- 
tion along the surface of the arm of 
the sort that a small section (about 
2 cm.) of skin is stimulated and this 
section travels, with the frequency of 
the vibration, up and down the arm. 
So far as frequency analysis of pure 
tones in the cochlea is concerned, this 
seems to represent a perfect synthesis 
in the study of the inner ear. 

But the analogy between the model 
and the ear goes even further. It 
turned out that, when not a continuous 
vibration but only an impulse of two 
cycles was presented, the maximum 
of the vibration on the skin was 
localized with the same precision as 
was a continuous long-lasting vibra- 
tion. The counterpart in hearing is 
very well known and it has always been 
one of the greatest puzzles, how the 
ear can make a frequency analysis of 
a tone that lasts only two full oscilla- 
tions with almost the same accuracy 
with which it can analyze a long tone. 
This is a fact that can not be explained 
by purely mechanical analysis. The 
experiments with the model show that 
it is really possible only by a com- 
bination of a rough mechanical analysis 
plus a very specific nervous inhibition. 

Experiments with the model shifted 
our attention to problems of inhibition 
in hearing and on the skin, since 
inhibition appears to modify these 
sensations to a very large degree. Both 
the model and the basilar membrane 
have two different kinds of inhibition. 
One is produced by the fact that the 
vibrations consist of traveling waves, 
and the different sections of the skin 
and the basilar membrane are reached 
and stimulated, not simultaneously, 


but with certain time delays. This 
type of inhibition serves to suppress 
those of the stimuli that started later. 
The second type of inhibition shows 
an enhancement of the sensation inten- 
sity near the maximum of the stimulus 
and produces a suppression of the 
lateral spread of the sensation when 
all the stimuli are presented at the 
same time. In the two sections that 
follow I should like to discuss these 
two types of inhibitions in their various 


aspects. 


INHIBITIONS PRODUCED BY TIME 
DIFFERENCES IN AUDITORY AND 
TACTILE SENSATIONS 


Directional hearing is probably one 
of the best-known instances of inhibi- 
tion produced by a time difference. If 
a sound hits both ears at the same 
time, we localize the source of the 
sound in the medial plane (von Horn- 
bostel, 1923). But if the sound reaches 
one ear earlier than the other, we 
localize the sound as coming from that 
side, since it is closer to the sound 
source. In experiments on localization 
it is more convenient to use two ear- 
phones than to move a sound source 
around the head and to supply the 
earphones with two equally loud and 
similar clicks. When there is no time 
difference between the two ears, the 
sound image appears in the medial 
plane. If there is a large time differ- 
ence, then the two clicks do not fuse 
and we hear two separate clicks, one 
in each ear. This is shown in the 
upper part of Drawing B of Fig. 17 
by the two circles representing the 
sound images observed in both ears 
separately. As the time delay becomes 
smaller, the images of the two sides 
suddenly fuse together, the loudness 
of the sound increases, and it appears 
in the left ear (in the figure), since 
the right ear received a delayed click. 
In the figure this is shown by the filled 
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Fic. 17. Inhibition of clicks by time 
delay. In A, two vibrators were placed on 


two points on the skin 12 cm. apart, and 
by means of discharging condensers two 
sharp sensations with a time delay were 
produced on the surface of the skin. When 
the time delay is zero, the sensation is 
felt in the middle between the two vibrators, 
as indicated by the dashed circle. When a 
time difference is present, the locus of the 
sensation travels closer to the vibrator that 
receives the click first. At the same time, 
the loudness of the sensation increases, as 
is indicated by the shading, and the spread 
of the sensation diminishes. B shows the simi- 
lar phenomenon of directional hearing when 
one click is presented to one ear and another 
click to the other ear. The time delays 
required to produce complete lateralization 
for the skin and for hearing are the same. 


dots. As the time delay becomes 
smaller, the unified sound image moves 
to the middle and, as the time delay 
is reversed, the image moves to the 
other side, where it again splits into 
two separate sound images. 

A completely analogous situation is 
found in the skin sensations. If we 
hold a thin wooden stick in both hands, 
and one end of the stick is hit sharply 
with a hammer, we have no difficulty 
stating which end of the stick was hit, 
because the traveling waves produced 
in the stick hit one hand earlier than 
the other (Katz, 1937). For measure- 
ments of this kind of phenomenon, we 
used two vibrators, each in contact 
with the thumb of one hand, or in 
contact with different sections on the 
skin of the arm. Figure 17 A shows 
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the effects of time delays on the 
sensation produced on the arm when 
two vibrators were applied about 12 
cm. apart. Again when the time differ- 
ences were large, a separate stimulus 
was felt under each vibrator, but they 
fused into a single sensation area for 
shorter time delays. It was surprising 
to find that the time difference neces- 
sary to make the fused sound image 
travel from one side of the head to the 
other was the same as the time differ- 
ence necessary to make the vibration 
sensation travel from one area on 
the skin to the other. The nervous 
systems controlling these kinds of 
inhibition must be very similar in both 
cases. 

But there was one difference which 
disturbs the complete analogy. As 
can be seen in Fig. 17, at zero time 
delay the sensation area on the arm 
was very much larger than at the 
other time delays. For hearing, this 
was not the case; the sound image 
of the clicks, when they were in the 
middle, always appeared to be of the 
same order of magnitude, regardless 
of location. We did extensive re- 
search on this difference. In place of 
the series of clicks, we used the so- 
called rotating vibration, analogous to 
the Drehton. It consisted of two 
sinusoidal vibrations, one at 50 cps and 
the other at 50.3 cps, placed a certain 
distance from each other on the skin 
of the arm. These two beating vibra- 
tions experience a constant change in 
their phase relations, going through a 
full cycle in about 3 seconds. The 
phase changes represent time delays 
relative to each other, and the sensation 
of the vibrations therefore moves 
alternately from one vibrator to the 
other. An unexpected finding was 
that, when the distance between the 
two vibrators was 20 cm., as shown in 
Fig. 18, the sensation image could 
hardly be felt for zero time delay or 
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Fic. 18. Two vibrators placed a certain 
distance apart on the surface of the skin 
and vibrating at frequencies of 50 and 50.3 
cps, respectively, produce a sensation which 
travels periodically from one vibrator to the 
other (Drehton). If the distance is large, 
zero phase between the vibration amplitudes 
diminishes the magnitude of the sensation 
intensity considerably. If the distance is 
smaller, there are no changes in the sensation 
magnitude during the traveling. Lateral 
interaction between the nerve systems under 
the vibrators decreases the loudness changes. 





zero phase, since the magnitude of the 
sensation became so small that it was 
below threshold. For larger phase 
differences, however, the loudness in- 
creased and was properly localized 
under one of the vibrator tips. As 
the distance between the vibrator tips 
was decreased, the changes in the 
sensation magnitude became smaller, 
and at a distance of 2.5 cm. on the 
lower arm there was almost no change 
in the magnitude of the sensation, and 
even its apparent lateral spread did 
not change as it traveled from one side 
to the other. This is exactly the same 


situation as is found in hearing and 
is illustrated in Drawing B of Fig. 17. 

From this I have concluded that, 
if there is a strong lateral neural inter- 
connection between two stimulated 
places, then there is only a very small 
change in the magnitude of the sensa- 
tion as it travels from one side to the 
other. But if the lateral neural inter- 
connection is loose, as when two 
vibrator tips are placed far apart, then 
it is very difficult to localize the sensa- 
tion between the two vibrators, and 
we feel nothing. Since in the ear the 
loudness changes are minimal as a tone 
is localized from one side to the other, 
we must assume that the interaction of 
the auditory nervous system between 
the two ears must be very strong. 
Perhaps a histological section of the 
branching of the auditory nerves made 
by Lorente de No (1933), and shown 
in Fig. 19, will illustrate this well 
organized interaction between the two 
sides, 

After the discrepancies between the 
directional hearing and the sensation 





Fic. 19. Neural lateral interconnections 
in the auditory nerve trunk (after Lorente 
de N6, 1933). 
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localization on the surface of the skin 
were eliminated, we tried to imitate on 
the skin directional hearing in a free 
sound field. As can be seen from Fig. 
20, two microphones were set up in a 
horizontal line about 20 cm. apart, 
corresponding to the distance between 
the two ears, Each of these micro- 
phones had its own amplifier which 
was attached to a model of the cochlea 
(described earlier), against which the 
arms were placed as shown in the 
figure. A condenser was discharged 
periodically through a loudspeaker and 
formed clicks of low pitch. If the 
loudspeaker was now moved around 
the room, the observer could tell when 
the loudspeaker passed through the 
medial plane, since at this point the 
sensation jumped from one arm to the 
other. At first, one had the sensation 
that the sound source passed in a 
straight line from one arm to the other. 
But if the observer was permitted to 
see the movements of the loudspeaker 
in the room and coordinate them with 
the sensations on his arms, after some 
training he began to project the skin 
sensations out into the room. The 
apparent distance of the loudspeaker 
was determined entirely from the loud- 





Fic. 20. Models of the cochlea applied 
to both arms for the investigation of 
phenomena similar to stereophonic hearing. 
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Fic. 21. Equipment to produce rotating 


tones and skin sensations with the help 
of a rotating phase shifter. 


ness of the sound, but the directions 
were localized more and more ac- 
curately. This projection of the sensa- 
tion out into the room, after a period 
of learning, is a very interesting 
experience. The neural interaction 
between the two arms is slight, and 
the sensation magnitude in the medial 
plane may fade out. If this is disturb- 
ing, it can be avoided by placing the 
two ear models on both sides of the 
same arm. 

The close analogy between direc- 
tional hearing and sensations on the 
skin produced by time delays can 
be further demonstrated by the cor- 
relate of rotating tones. Two beating 


frequencies were produced by the 
apparatus shown in Fig. 21. It con- 
sists of a single oscillator with a 


frequency N, which feeds its current 
once through an attenuating box to an 
electromagnetic driving unit, as shown 
in the upper part of the drawing. The 
oscillator is also connected to a phase 
shifter which is constantly driven by 
a 10-rpm motor so that the frequency 
leaving the phase shifter is always % 
cps higher than the incoming frequency. 
This branch of the circuit is also 
attached to a second electromagnetic 
vibrator. The important feature of 
this circuit is that the basic frequency 
of the two tones can be changed with- 
out changing the beat frequency at all, 
which means that adjustments are very 
easy to make. 
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Fic. 22. The sensations produced by 


rotating vibration or rotating tone at the 
extreme left or extreme right have many 
qualities in common. (a) As the frequency 
increases, the size of the sensation decreases, 
and at the same time the lateral movements 
become smaller, until finally the displacement 
almost disappears. (b) The sound image of 
a rotating tone is localized between the two 
ears at low frequencies, but at 3,000 cps 
seems to move to the top of the head. At 
high frequencies, the sensation on the finger 
tip is localized quite outside the finger 
itself. 


It is now known from the description 
of the rotating tones that at low fre- 
quencies, such as 100 cps, the sound 
image travels from one ear to the other 
in a plane that goes through the two 
ears. If the frequency is increased to 
above 1000 cps, the plane in which 
the sound image seems to rotate rises 
about 2 env. above the axis of the ears. 
If the frequency is still further in- 
creased, the sound image seems to 
rotate in a plane along the top of the 
head. I do not know any explanation 
for this phenomenon, but it can easily 
be repeated by exchanging the two 
vibrating units in Fig. 21 with two 
earphones. The question is whether 


a similar phenomenon is observed on 


the skin. To test it, we may put our 
thumbs on the two vibrators in Fig. 21. 
When the frequency is low, the rotation 
of the skin sensation occurs in a plane 
on the surface of the skin, but, as the 
frequency is increased, the plane of 
rotation moves outside the skin (Fig. 
22, left side), and behaves in complete 
accordance with the displacements 
observed in hearing (right side of 
Fig. 22). The only difference is that 
on the skin the displacement of the 
rotating plane starts at a frequency 
half as high as in hearing. 

The rotating tones represent a very 
important tool in the investigation of 
inhibition, since there is no change at 
all in the magnitude of the stimulus 
and all the phenomena observed are 
a consequence of inhibition produced 
by time intervals alone. In the next 
section, we shall describe some kinds 
of inhibition produced by changes in 
the magnitude of the stimulus along the 
stimulated area. These are of a very 
different type. 


INHIBITION AND SUMMATION IN 
SPATIALLY DISTRIBUTED STIMULI 


If we lay a rod on the surface of 
the skin and set it into vibration 
perpendicular to the skin surface by 
attaching it to a vibrator (see left side 
of Fig. 23), then the stimulus acting 
on the skin is of equal amplitude along 
the whole length of the rod and the 
whole rod vibrates in the same phase 
with no time delay. Under these 
conditions we do not feel the rod 
vibrate along its full length, but only 
a section in the middle. The lateral 
spread of this section depends on the 
frequency of the stimulus, its ampli- 
tude, and the density of the neural 
innervation of the skin section. If the 
amplitude distribution is not even 
along the length of the rod (it can 
easily be made so by pressing a mass 
against one side of the rod, as is shown 
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Fic. 23. For short periods of presenta- 
tion, even a small asymmetry in the distribu- 
tion of the stimulus along the skin produces 
a shift in the center of the sensation, but 
without enlarging the width of the field 
of sensation. This is one reason why the 
field of sensation moves along the membrane 
in the model as the stimulating frequency 
is changed. 


vibrating 
rod 


on the right side of Fig. 23), then 
the sensation is immediately displaced 
toward the maximum of the stimulus 
amplitude without losing its sharpness. 
In general it is found that the sensa- 
tion is very much sharper and more 
centered than the stimulus distribution 
that has been applied to the skin. This 
sharpening effect of the sensation 
distribution along the surface of the 
skin is especially pronounced for stim- 
uli with short presentation times. If 
we press the frame illustrated on the 
left side of Fig. 24 slowly down on 
our arm, then we may feel the edges 
of the frame. In general the edges 
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Fic. 24. Pressure applied slowly to the 
surface of the skin produces a sensation as 


large as the size of the object. But if an 
object is tapped against the skin, a sensation 
is observed only in the center of the object. 


BEKEsy 


are more clearly realized than the 
middle section. But if we tap the top 
of the frame with a hammer, as in the 
right-hand drawing, then the sensation 
on the edges disappears, and only a 
push in the middle is felt with a 
lateral spread of less than a few centi- 
meters. It is in this type of neural 
inhibition that nature shows its superi- 
ority to today’s electronics and filtering 
techniques. 

How far neural inhibition can go 
may best be illustrated by the follow- 
ing experiment. Five vibrators were 
placed 2 cm. apart in a line, with the 
vibrator tips sticking out a small dis- 
tance through holes in a flat brass 
plate. The box with the vibrators can 
be seen in Fig. 25. The vibrators 
were put into vibration by a series 
of sharp clicks in such a way that the 
vibrator on the left side had a fre- 
quency of 20 pulses per second, and 
the frequency of each succeeding vibra- 
tor was one octave higher. The whole 
frequency range covered four octaves. 
The question is, when the arm is 
laid against the plate, and all five 
vibrators are set in motion, which 
vibrator is felt and what vibration 


320cps 
vibrators 


box for the driving 
units 
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Fic. 25. Series of vibrators, 2 cm. apart 
and increasing in frequency from left to 
right. The vibrations consisted of a series 
of pulses similar to one shown in the left- 
hand corner. The unit was placed along 
the lower arm. 
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frequency does it have? When the 
sensation magnitudes of all the vibra- 
tors placed on the surface of the arm 
are made equal by adjusting their 
relative amplitudes, then only the vi- 
brator in the middle is felt with its 
corresponding frequency sensation. All 
the other vibrators on both sides are 
inhibited and disappear from the pic- 
ture. This is very surprising if we 
do not take into account the micro- 
physiology. When the skin is stimu- 
lated by vibrators, the end organs 
respond with a series of nerve pulses 
whose frequency depends on many 
factors. Certainly a very large number 
of different pulse frequencies are 
present. It would be impossible to feel 
a sensation of frequency for the vibra- 
tions if many of these pulse frequencies 
were not inhibited. 

But inhibition is really not the 
correct word for the phenomenon when 
complete ‘suppression is meant, since 
in all the above-mentioned cases the 
inhibited sensation somehow contrib- 
uted to increasing the loudness or the 
sensation magnitude of the central 
sensation. I prefer the word “funnel- 
ing” to describe the experiment with 
the five vibrators, for, although four 
of them did not contribute to the 
sensation of vibration frequency, they 
did contribute to the intensity of the 
vibrator in the middle whose frequency 
was felt. 


FUNNELING ACTION OF THE NERVOUS 
SYSTEM IN HEARING AND SKIN 
SENSATIONS 


can be easily 
demonstrated in directional hearing. 
If both ears are presented with a 
click of equal loudness, but with so 
large a time delay that the sound 
image is localized in one ear, the click 
in the other side will seem to have 


Funneling action 


been completely suppressed. That this 
is not so can be shown by switching 
off the apparently suppressed click. 
Now the loudness of the sound image 
in the first ear very definitely decreases, 
and it is clear that the inhibited click 
still contributed to the loudness of the 
sound image. 

The best-known funneling action is 
probably the so-called Mach ring, or 
the law of contrast in vision. In a 
demonstration of this law, we have on 
the left side a surface that is homo- 
geneously illuminated, and on the right 
side a surface whose brightness in- 
creases continuously until a certain 
level is reached, which is then kept 
constant, according to the stimulus 
distribution shown in the upper draw- 
ing of Fig. 26. In this situation the 
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Fic. 26. Mach's law of contrast in vision. 
When light density is distributed along the 
retina with two changes in the slope, as 
shown by the solid line in the upper draw- 
ing, the sensation intensity has a different 
distribution, shown by the dotted line. The 
arrows show how the funneling action of 
the nervous tissue may be assumed to pro- 
duce this type of sensation pattern. The 
lower drawing shows how the law of 
contrast can sharpen the sensation distribu- 
tion relative to the stimulus distribution 
when the maximum of the stimulus is flat. 
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eye tends to decrease the brightness 
where the stimulus starts to increase, 
and to increase the sensation where 
the stimulus reaches its maximum, 
as is schematically represented by the 
dotted curve in the upper drawing of 
Fig. 26. If the light distribution on 
the retina shows a flat maximum, as in 
the lower drawing of Fig. 26, then, 
as a consequence of the funneling 
action, the lateral spread of the sensa- 
tion is considerably sharpened. The 
contrast effect is not a trivial effect; 
it plays a very important role in vision. 
For example, when we look at a bright 
spot in a dark field we do not always 
see a halo around the bright spot. 
This halo is produced by the scattered 
light in the eye, and, though it is not 
always seen, it is always present, since 
on an excised eye the retina is sur- 
rounded near the bright spot by a very 
disturbing halo. Were it not for the 


contrast effect, which suppresses the 
halo, most objects would look quite 


fuzzy. 

It is not difficult to produce Mach 
rings, or the analogous contrast phe- 
nomena on the arm. The equipment 
used is shown in the lower drawing 
of Fig. 27. It consisted of a plastic 
sheet with two points cut at a and b, 
so that it could easily be bent around 
an axis perpendicular to its surface. 
A vibrator was attached to a lever 
made of tubes, and with the help of 
two forks it distributed the vibration 
amplitude along the edge of the plastic 
sheet, as shown in the drawing. The 
left-hand part of the sheet had a 
constant amplitude for one-third its 
length, and the right-hand side also 
had a constant amplitude, which was 
about three times larger than on the 
left. In the middle of the sheet the 
vibration amplitude showed a constant 
increase. Pressing the lower edge of 
the sheet against the arm produced a 
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Fic. 27. In order to test the validity of 
the law of contrast for skin sensations, a 
plastic sheet was set in vibration with the 
amplitude distribution shown in the figure. 
This amplitude distribution was achieved 
by an oscillating lever with a fixed point c. 
The plastic sheet was flexible at points a 
and b. The distribution of the intensity of 
the sensation along the forearm was found 
to be similar to the distribution in vision. 


small sensation on the left side and a 
very strong and exaggerated sensation 
of vibration maximum on the right 
side. But between them, there was 
a section in which no vibrations could 
be felt—not just that nothing was 
felt, but the feeling that nothing was 
there. This sensation of nothingness 
is probably produced by inhibition 
of the usual small sensations always 
present on the surface of the skin— 
sensations that we may call internal 
noise. The drawing of the sensation 
at the top of the figure is a free-hand 
drawing by an observer trying to show 
the local distribution of the sensation 
magnitude along the skin. All the 
characteristics of Mach rings are 
present on the skin. I think that these 
same phenomena play a very important 
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role in the sharpening of the sensations 
along the basilar membrane in hearing 
(v. Békésy, 1928). 

The skin is a very convenient place 
to demonstrate funneling action, since 
the lateral spread of a sensation can be 
easily felt and directly measured in 
centimeters along the skin surface. Ob- 
servers have no difficulty making a 
drawing of the distribution of the 
sensation magnitude along the skin 
surface. If the funneling action in- 
creases, the lateral spread of the sensa- 
tion shrinks to a center. This can be 
shown, for example, for the lateral 
spread of vibrations of 50 cps produced 
on the arm by a vibrating frame. 
When the vibration amplitude was 
20 db above threshold, the lateral 
spread of the sensation was as long 
as the edge of the frame. The distribu- 
tion of the sensation intensity is shown 
in the upmost drawing of Fig. 28. If 
we now keep these vibrations constant 
and superimpose onto them a series 
of sharp clicks with increasing ampli- 
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Fic. 28. A sharp click has a very strong 
funneling action and it can make the lateral 
spread of a 50-cps vibration shrink down 
to a small area. The small gray area in 
each curve represents the distribution of 
the sensation of a click, the outlined white 
area the distribution of a 50-cps vibration 
produced by a frame whose edge is 13 cm. 
in length. 
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Fic. 29. A time delay between two vibra- 
tors produces funneling toward the first 
stimulus. The figure shows that the funnel- 
ing is more pronounced when the distance 
between the vibrators is small than when 
it is large. The lateral spread of the 
sensation is much smaller for a_ short 
distance than for larger distances. 


tudes, then we find that the clicks al- 
ways have a very narrow lateral spread, 
and as the loudness increases they are 
able to contract the lateral spread of 
the 50-cps sensation, i.e., funnel them 
into their own sensation field. This 
effect is shown in the consecutive 
drawings in Fig. 28 from the top 
downward. 

Once a funneling action is estab- 
lished in a neural pathway, it can not 
be broken off at will, but it will last 
for a certain length of time, at least 
0.5 to 0.8 sec. We can demonstrate 
this on the arm. For this purpose we 
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use two vibrating frames set close 
together on the arm, as shown in 
Fig. 29. With a special type of pre- 
cision switch, random vibrations (sim- 
ilar to white noise) were alternately 
switched from one frame to the other 
with a full period of a half second. 
When the two edges of the frames 
are close together, only one completely 
continuous sensation is felt in the 
middle between them, without any 
pulsation at all. Both sensations are 
funneled into the same channel, and 
since one is always vibrating there is 
no intermittence. But if we increase 
the distance between them, then there 
is a value at which we can no longer 
make them fuse together and we feel 
one frame pulsating, as illustrated in 
the lower drawing of Fig. 29. At any 
one time, we can have on the arm 
only one funnel, but after a certain 
amount of training we can shift it from 
one frame to the other voluntarily. 
The same experiment can be carried 
out in hearing by setting up two loud- 
speakers in a room and feeding them 
alternately with white noise. If they 
are close together, we localize only one 
sound source, located in the middle 
between the two loudspeakers. But 
if the distance between the loudspeakers 
is increased, we hear only one of the 
loudspeakers with interrupted white 
noise (v. Békésy, 1931), and again 
our attention can be shifted from one 
loudspeaker to the other one. 


CONCLUSIONS 


A comparison of hearing and the 
sensation of vibration along the skin 
shows a surprisingly large number of 
similarities. These similarities make 
it possible to find points in common 
between the two sense organs to a 
much larger degree than is generally 
assumed. The advantage of this to 
hearing is that, by making the neces- 
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sary adjustments, we can postulate 
some phenomena of the ear which we 
can not observe directly. 

It is particularly helpful that the 
sensitivity of the skin changes to such 
a large degree along the surface, 
which permits us to extrapolate from 
the skin to the ear. How this is done is 
indicated in Fig. 30, in which some 
sensations are charted as they change 
along the surface of the skin from the 
shoulder to the finger tip. The top 
figure indicates the most probable 
nerve density on the cortex corre- 
sponding to the different areas of the 
arm and hand. The second figure 
shows how the smallest distance 
between two points that can be dis- 
criminated decreases as we approach 
the finger tip. When the two-point 
threshold is small, the threshold is 
in general low, but at the same time 
the curves of equal loudness are quite 
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Fic. 30. When the density of the nerve 
supply is high, the threshold of sensation 
tends to be low and the loudness increases 
slowly with an increase in the magnitude 
of the stimulus. A rapid increase in the 
loudness indicates a small nerve density 
(recruitment). 
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flat for higher sensation magnitudes, 
on the skin just as in hearing. The 
next figure shows that the apparent 
lateral spread of a vibrating sensation 
on the skin decreases with innervation 
density, not only for mechanical stimu- 
lation, but also for electrical stimula- 
tion, as is shown in the last drawing. 
Besides the qualities mentioned, a 
large number of others were investi- 
gated in the same way. 

Since the inner ear is more sensitive 
than the finger tip, these curves 
indicate the direction in which we have 
to extrapolate when we go from the 
shoulder, through the finger tip, to the 
inner ear. Certainly this kind of 
extrapolation suggests many new ex- 
periments in the field of hearing, and 
in skin sensations as well. 

But besides this, I have hopes of 
a different sort. The organ of Corti 
evolved step by step from the skin, 
until it reached the highly differ- 
entiated shape that can be found in 
man or in the guinea pig. The ques- 
tion is, what functions can the organ 
of Corti perform that the skin cannot? 
Already a partial answer to this ques- 
tion permits us to concentrate our 
attention during investigation of the 
organ of Corti on certain points in its 
structure, which could otherwise hardly 
be understood. At the moment, there- 
fore, I am hunting for phenomena 
which differ in hearing from phe- 
nomena observed on the skin, in order 
to find out where the superiority of the 
organ of Corti over the skin comes in. 
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In the usual method of constructing 
an average curve for a group of 
learners, the individual scores are 
pooled for a trial of given ordinal 
position. In an alternative method, 
the scores of various learners are 
pooled for trials specified with reference 
to a performance criterion, regardless 
of the original, absolute, ordinal posi- 
tion of these trials. The Vincent curve 


is a well-known example of the latter 
method. Mere recent applications by - 
Solomon and Wynne (1953), Hayes 
(1953), and Underwood (1957) sug- 


gest that the criterion reference method 
may reveal details of the learning 
process which are lost in the usual type 
of curve. 

It has long been known that the 
Vincent curve produces an artifactual 
end-spurt unless certain trials are 
omitted from the analysis (Hilgard, 
1938; Melton, 1936). This raises the 
question of whether the details revealed 
by the newer applications may also be 
artifacts introduced by the method of 
analysis, rather than real behavioral 
phenomena. 


Unperwoop’s CycLicaAL LEARNING 
CURVE 


Underwood’s analysis of human 
serial-learning data (1957) produced 
a curve with a prominent and orderly 
cyclical component. Cole (1957) has 
recently discussed several methods of 
forcing noncyclical data into cyclical 
form, but none has the power of 
Underwood’s technique. 


Underwood’s curve is based on four 
kinds of points—two which determine 
the peaks and valleys of the cycles, and 
two intermediate types which will be 
considered later. The peaks represent 
criterial trials, on which 1, 2,3,.. .10 
items of the ten-item list were first 
correctly anticipated. (Thus, the curve 
contains not one, but a whole series of 
end-spurt artifacts.) The valleys of 
the curve are arbitrarily plotted mid- 
way between adjacent peaks, on 
the horizontal axis. Their vertical 
positions are determined by finding 
each S’s worst trial for the interval 
between criteria, and taking the mean. 
The source of the periodicity seems 
obvious: If one plots points based on 
data which have been selected, alter- 
nately, for their upward and down- 
ward deviations, the resulting curve 
must rise and fall—regardless of 
whether the original fluctuations in the 
raw data were lawfully periodic or 
random. 

Underwood's method is based on the 
premise that if learning involves a 
series of systematic fluctuations, the 
criterial trials will catch these oscilla- 
tions on the upswing. However, this 
would only be true if the number of 
cycles happened to correspond to the 
number of criteria employed. The 
criteria are chosen arbitrarily by E, 
and need not be known to S, nor 
influence his behavior in any way. 
The number of peaks appearing in the 
curve will equal the number of criteria 
employed, regardless of the nature of 
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the data (except when performance 
improves so rapidly that successive 
criteria are attained on successive 
trials, with no room between for 
minima). If Underwood had chosen 
to be concerned only with criteria of 
2, 4, 6, 8, and 10 items correct, he 
would have found five peaks instead of 
nine. Conversely, if the data had 
actually contained 20 peaks, this fea- 
ture would have been concealed. 

A new method of analysis may be 
evaluated by applying it to data whose 
relevant characteristics are already 
known, to determine whether it reflects 
them faithfully. We have done this for a 
case where it is known that trial-to- 
trial fluctuations are random, aperiodic, 
and lawful only in a_ probabilistic 
sense. Instead of learning a ten-item 
list, our twenty S’s tossed ten coins on 
each trial, and tried to get as many 
heads as possible. (Cole [1957] recom- 
mends unicorns as S’s for this type of 
investigation ; however, we used lepre- 
chauns for the sake of comparability 
with Underwood’s human S’s.) On 
the first trial, all of the coins had tails 
on both sides, to simulate items which 
the naive § has practically no chance 
of anticipating. As training pro- 
gressed, the two-tailed coins were 
gradually replaced with normal coins, 
to simulate items which the partly 
trained $ might or might not anticipate 
—or by two-headed coins, to simulate 
items which § has mastered. The 
exchange of coins was arranged to 
make the general level of performance 
parallel that of Underwood’s Ss. Each 
exchange caused an increase in average 
score, never a decrease, so there was 
nothing analagous to a periodic func- 
tion. The results were analyzed by 
Underwood’s method, and the resulting 
cyclical curve is shown as a solid line 
in Fig. 1. 

This curve does not faithfully re- 
flect the behavior of any individual S, 
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Fic. 1. Trials-to-criteria learning curve 
(broken line) and a cyclical curve (solid 
line) based on the coin-tossing behavior of 
twenty leprechauns. 


that of the “average subject,” or that 
of the group as a whole. The peri- 
odicity is due entirely to the method 
of manipulating numbers, and similarly 
undulating curves would be produced 
by application of the method to any 
other series of irregularly varying 
numbers, regardless of whether they 
represented error scores, latencies, 
amplitudes, or a table of random 
numbers. 

The objection may be raised that 
the curve does, in fact, give a meaning- 
ful picture of both coin tossing and 
learning. It is true that criterial trials 
are, in general, followed by a drop in 
performance. (This is the fact behind 
the Gambler’s Fallacy.) However, 
two points must be emphasized: (a) 
This drop is only to the average level, 
not below. (b) There is nothing 
orderly about the postcriterial devia- 
tions—a positive deviation is just as 
likely midway between criteria as any- 
where else. 

Pre- and postcriterial trials. Under- 
wood’s curve includes not only the 
highly selected maxima and minima, 
but also points representing the trials 
just before and just after attainment 
of each criterion. These trials are not 
directly selected for exceptionally high 
or low scores, but only for their prox- 
imity to the exceptionally high-score 
criterial trials. Since chance does not 
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carry over from trial to trial, the pre- 
and postcriterial points are free of ob- 
vious bias. However, a subtler kind of 
selection is still operating to make the 
precriterial points spuriously low. 

Suppose several thousand Ss _ toss 
10 normal coins per trial, and are 
scored by the number which fall heads 
up. A criterion of 8 heads is employed, 
but each S$ is tested for one additional 
trial. Scores on the postcriterial trial 
will be symmetrically distributed around 
a mean of 5, with a few extreme 
scores of 0 and 10. However, the 
distribution for the precriterial trial 
will be truncated. The mode will be 
5, and the minimum 0; but the maxi- 
mum will be 7, since a higher score 
would make this a criterial trial and 
remove it from the precriterial cate- 
gory. The mean will, of course, be 
less than 5. 

If the coins were strongly biased in 
favor of heads, truncating the distribu- 
tion would cause a greater reduction in 
the mean. More generally, the amount 
of distortion produced increases as the 
average scoring level approaches the 
level defined by the criterion. In the 
case of Underwood's curve, perform- 
ance is always close to the next 
following criterion, and the depres- 
sion of precriterial points is always 
substantial. 

This type of selection affects not 
just one, but all precriterial points. 
We may therefore examine its im- 
plications for the Vincent curve, which 
consists entirely of precriterial points. 
Strictly speaking, Vincent curves are 
spuriously low throughout their length ; 
in practice, however, with a reasonably 
severe criterion, the error would be 
negligible in the early part of the curve. 
As learning proceeds and average 
performance approaches the criterial 
level, the criterion will be met by a 
positively deviant score. All trials on 
which this happens will, by definition, 


be removed from the precriterial 
category—usually from the upper end 
of the score distribution. The effect 
of such removals will be to depress the 
later part of the curve more than the 
early part, producing a spurious nega- 
tive acceleration. 

The effect of selection on precriterial 
points is apparent in both Underwood’s 
Fig. 1 and our own: The precriterial 
points would all fall below a line con- 
necting the postcriterial points. (This 
latter line would constitute the only 
satisfactory learning curve in these 
figures. ) 


THE BACKWARD CURVE 


In some situations, learning may be 
characterized by a period of little or 
no progress followed by very rapid 
improvement. This feature, concealed 
by a conventional average curve, may 
be shown clearly by the “backward 
curve,” which presents performance 
as a function of the number of trials 
before (or after) a criterion. 

As in the Vincent curve, criterial 
trials should not be plotted, since they 
are selected for excellence, and their 
average is spuriously high. In the 
case of multitrial criteria, the trial just 
before criterion should also be omitted, 
since selection operates negatively here, 
making the average spuriously low. 
The effect of plotting these biased 
points can be seen in the curves of 
response speed presented by Solomon 
and Wynne (1953) for shuttle-box 
learning in dogs. Their Fig. 12 shows 
median speed plotted with reference 
to a criterion of one trial with latency 
less than 10 sec. The criterial trial 
stands out clearly above the surround- 
ing trials. Their Fig. 15 shows the 
same data replotted with reference to 
a criterion of 10 consecutive trials with 
latencies less than 10 sec. The trial 
before criterion stands out clearly 
below the surrounding trials. The 
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dramatic rise between these two biased 
sets of data must, of course, be dis: 
counted as spurious. Although elimi- 
nating 11 trials from Fig. 15 would 
leave a large gap in the curve, it is im- 
portant to note that the point at issue 
would still be clearly demonstrated: A 
period of slow progress is followed by 
a period of rapid progress—all within 
the precriterial part of the curve. 
Hayes (1953), analyzing discrimina- 
tion learning in rats, used a criterion of 
9 successive correct trials. He omitted 
the criterial trials and the immediately 
preceding trial from his backward 
curve (Hayes, 1953, Fig. 4). How- 
ever, he failed to note that all pre- 
criterial trials are subject to selection. 
With this type of criterion, selection 
operates somewhat indirectly: A pre- 
criterial trial may be either right or 
wrong (except for the one trial 
immediately before criterion) and thus 
seems free of bias. However, selection 


still operates in terms of nine-trial 
blocks. Any such block which happens 
to be all correct will be removed from 


the precriterial category. Since only 
correct trials are being removed, there 
must be a reduction in percentage 
correct—for individual trials as well 
as for blocks of nine. 

Although Hayes’s curve shows a 
fairly large and rapid rise between the 
pre- and postcriterial segments, no 
significance can be attached to this 
feature, since it may be due to rapid 
learning, statistical artifact, or a com- 
bination of the two. However, it 
should be noted that the curve shows 


an even more rapid rise within the 
precriterial segment. This rise is not 
an artifact, and it establishes the point 
in question: Very rapid improvement 
occurs in the criterial region. 


SUMMARY 


A criterion-reference curve shows the 
average performance of a group of 
learners on trials specified with refer- 
ence to a performance criterion, re- 
gardless of the original, absolute, 
ordinal position of these trials. The 
method of construction introduces 
selection errors which may lead to 
erroneous conclusions if their effects 
are not recognized. These effects may 
be clarified by substituting a tossed- 
coin probability model for the experi- 
mental Ss. 
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In recent years there has been 
increasing interest on the part of psy- 
chologists in problems relating to 
language behavior. This interest has 
found expression in diverse experi- 
mental methods, theoretical view- 
points, and areas of emphasis and has 
resulted in the establishment of the 
field of psycholinguistics. The older 
interests and methods of linguistics 
itself, the study of the physical cues 
of communication, and the study of 
what is communicated have all been 
incorporated to form psycholinguis- 
tics as a discipline. 


RELATION OF THE PROBLEM TO 
PSYCHOLINGUISTICS 


Of particular interest in psychology 
has been the study of the communica- 
tion of emotive or connotative mean- 


ing. Osgood (1952) has surveyed 
theories of meaning and methods of 
studying it, and also introduces a new 
method, the Semantic Differential. 
The Semantic Differential consists of 
a number of equal-appearing intervals 
scales, each defined by a pair of polar 
adjectives—‘‘hot-cold,”’ ‘‘good-bad,” 
etc. When words or concepts are 
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rated on this set of scales, it has been 
found that the variation in the ratings 
can be accounted for by three common 
dimensions. This finding indicates 
that variation in connotative meaning 
is not as complex as might be supposed 
and that it is amenable to at least a 
quasi-quantitative analysis. Osgood, 
in discussing directions for further 
research (Osgood & Sebeok, 1954, p. 
180), talks of ‘‘laws of word mixture.”’ 
Such laws would indeed represent a 
considerable step forward, for the de- 
gree of correspondence between quan- 
titative data and the real number sys- 
tem depends on the operations which 
can be performed on the data and es- 
pecially on the manner in which the 
variables combine (cf. Gulliksen, 
1956b; Weitzenhoffer, 1951). In his 
discussion, Osgood suggests that “‘laws 
of word mixture”’ would follow either 
something akin to vector addition, 
i.e., using the Semantic Differential, 
the coordinates of the combination 
would be the sum of the coordinates 
of the components, or there would be 
an averaging effect, the coordinates of 
the combination would be the average 
of the components. This is felt to be 
too restricted a formulation; there 
might be several ‘“‘laws,’’ depending 
on the words being mixed. 

Osgood mainly considered adjec- 
tive-noun combinations, and it may 
be that his suggestions are quite rele- 
vant tosuchcombinations. However, 
if connotative communication is to be 
thought of as the representation of a 
space in which different words repre- 
sent projections on some coordinate 
system, it would seem that words 
which have no projections of their 
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own but merely serve to stretch or 
compress the projections of other 
words would be a useful adjunct. 
That is, these words would act like 
the scalar multipliers of vector 
algebra. 

Words which have this stretching 
property would seem to be the inten- 
sive adverbs such as “‘quite,”’ ‘‘very,” 
and “unusually.” Subjective analy- 
sis of the change in intensity on ap- 
plying adverbs to adjectives bears 
this out. Consider “very” applied to 
“bad” and “pleasant’’ and judged on 
the evaluative dimension. ‘Very 
bad” is more unfavorable than ‘“‘bad”’ ; 
“very pleasant’’ is more favorable 
than “pleasant.” If “bad” were rep- 
resented by a negative number, 
“pleasant’’ by a positive one, and 
“very” by a number greater than 
unity, the combinations would behave 
in exactly this manner. On the other 
hand, suppose the same two adjec- 
tives were modified by “slightly.” 
Here, the combinations are less ex- 
treme than the adjectives alone, but 
one would still say that “slightly bad”’ 
was unfavorable and “slightly pleas- 
ant’”’ was favorable. This case can 
be accounted for by assuming once 
again that “bad” and “‘pleasant”’ are 
negative and positive numbers, re- 
spectively, but that “slightly” is a 
positive number less than one. 

Implicit in the above discussion is 
the assumption that the number as- 
sociated with an adjective used alone 
is the same as that which is multiplied 
by the adverb number. That is, we 
are dealing with the same adjective 
quantity whether the adjective is 
used alone or in combination. 

In a somewhat more formal way, 
this particular “law of word mixture” 
may be stated in terms of the following 
postulates : 


1. There is a number associated 
with each adjective. 


2. There is a number associated 
with each adverb. 

3. The intensity of an adverb- 
adjective combination is the product 
of these two numbers. 

4. The intensity of the adjective 
used alone is the number associated 
with it when used in combination. 

5. A set of adjectives can be chosen 
which may be scaled on a single di- 
mension on which all will have the 
same zero point. 


The last of these postulates is intro- 
duced because it will be useful in 
tying the model to data and also be- 
cause it is more parsimonious than 
allowing each adjective to have its 
own zero point. 

The formulation presented here 
should be reflected by the psychophys- 
ical scale values of adverb-adjective 
combinations of the type described. 
It is to be remembered, however, that 
scale values have an arbitrary origin 
in the sense that any set of scale 
values, other than one which already 
constitutes a ratio scale, may have a 
constant added to each member of the 
set without distorting the scale. 

The scale value of each adverb- 
adjective combination, then, could be 
expressed as 


Ny = Cs; t+ K 


where 


x;; = the obtained scale value of the 


ith adverb in combination 
with the jth adjective ; 

the multiplying value of the 
ith adverb; 

the psychological scale posi- 
tion of the jth adjective; 

the difference between the 
arbitrary zero point of the 
obtained scale values and the 
psychological zero point of 
the scale. 





ADVERBS AS MULTIPLIERS 29 


A matrix X can be formed of the 
obtained scale values of adverb-ad- 
jective combinations with the scale 
values of the unmodified form of the 
adjective as the first row, the adverb 
subscript denoting the row, and the 
adjective subscript denoting the col- 
umn. The plot of any row of this 
matrix, say, i, against any other row 
I should be linear; its slope is ¢;,/cr. 
Correspondingly, any column j plotted 
against any other column J should 
also give a linear relation with slope 
s;/sy. If the scale value of the un- 
modified adjective is also included, it 
may be thought of as simply s; + K, 
and the multiplying value of the ad- 
verb in this case may be said to be 
unity. This additional assumption 
enables us to find the absolute value 
of the c; because the slope then be- 
comes ¢;/1. It is then possible to 
work back, substituting the values of 
the c; thus obtained to arrive at least- 


squares estimates of s; and K. If 
such a matrix is formed and is found 
to be of the form indicated, then the 
model may be said to have been up- 


held. 


EXPERIMENTAL PROCEDURE AND 
METHOD OF ANALYSIS 


Constructing the Questionnaire 


The model was to be tested using all com- 
binations of the nine adverbs and fifteen ad- 
jectives listed in Table 1 plus the unmodified 
adjectives. Accordingly, a two-part ques- 
tionnaire was constructed to secure responses 
to the stimuli. Part I consisted of these 150 
combinations, fifteen of which were repeated, 
and 39 filler items, which were adjectives pre- 
ceded by two of the adverbs, eg., “very 
slightly admirable.” Thus there was a total 
of 204 stimuli. Table 1 also gives the fre- 
quency of usage ratings of the words used. 
It can be seen that all are quite common. 

The careful choice of the experimental defi- 
nitions of any theoretical construct is felt to 
be of extreme importance, whether a simple 
mathematical model such as the present one 
or some more general theory is being tested. 
In fact, one may speculate on the possibility 


TABLE 1 


Ratincs OF Worps USED IN 
COMBINATIONS 


FREQUENCY 








Fre- 


Adverb quency* 


Adjective 





Slightly 42 
Somewhat 50+ 
Rather 100+ 
Pretty® 43 
Quite 100+ 
Decidedly 11 
Unusually 11 
Very 100+ 
Extremely 35 


Evil 
Wicked 36 
Contemptible + 
Immoral 2 
Disgust (ing) 21 
Bad 100 +- 
Inferior 19 
Ordinary 50+ 
Average 50+ 
Nice 100+ 
Good 100+ 
Pleasant 100+ 
Charming 31 
Admirable 10 


Lovable 3 











| 
| 
' | 





® Frequency per million running words as given in 
Thorndike-Lorge frequency count (1944). Frequency 
of words occurring 50-99 and 100 or more times per 
million are given only as 50+ and 100+, respectively. 
> Frequency of the quantitative adverbial meaning. 


of many potentially worthwhile formulations 
being abandoned because of failure to make 
the best possible choices at just this point. 
Consequently, considerable care was used in 
selecting the words to be included in the 
study since it was felt that only the true, 
dimensionless intensive adverbs could be 
classified with any degree of confidence as 
multipliers, and it appeared wise at this early 
stage to limit the adjectives to a single dimen- 
sion of connotation. There are undoubtedly 
other adverbs that could have been substi- 
tuted for those used. “Highly,” ‘‘moder- 
ately,”’ and “fairly” come to mind and there 
are undoubtedly numerous adjectives that 
would have served. The adverbs actually 
chosen were selected because they are essenti- 
ally “dimensionless,” purely quantitative in 
their usage. ‘‘Unusually’’ is a possible ex- 
ception to this, but it was the experimenter’s 
belief that the connotation of rarity which 
this word seems to imply is becoming lost in 
many contexts and thus would not constitute 
a complication. ‘Completely,’ on the other 
hand, was rejected because of its implication 
of fullness, entirety, and conclusiveness, 

The adjectives were chosen on the basis of 
their having little in the way of connotations 
other than the evaluative. The adjectives 
used are not completely free of other connota- 
tions, but they were felt to be relatively pure 
in this respect. 
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The reader who has inspected the list of _jectives turned out to have s; values very close 
words may be dismayed at the thought of to zero. This may be the reason for these 
rating combinations such as “extremely aver- words sounding so odd when modified, since 
age” or “slightly ordinary.” These combina- there is no good reason to try to stretch or 
tions were only hesitantly included, but the multiply anything which is nearly zero. 
completeness of the experiment won out over The stimuli were arranged in random order, 
good usage. We may anticipate later discus- subject to the restriction that, except for the 
sion somewhat and report that these two ad- filler items, neither the same adjective nor the 


The most important means of human communication is by means of words, 
and yet little of a scientific nature is known about how people go about 
using and interpreting them. This experiment is an attempt to find out 
about one aspect of this problen. 


SECTION I 


To imply favorable or unfavorable opinions about people, many different 
words and phrases are used, depending not only on the specific context or 
situation, but also om the degree of the judgment. It is the way in which 
these different degrees are communicated which interests us in this experi- 
ment. Suppose, for example, that someone were described as "Very respectable." 
How favorable would you feel this was? If you felt that it implied a medium 
degree of favorableness, you would put a cross in the box labeled 3, as in- 
dicated in the following marked sample item. 


Most - Most 
Unfavorable Neutral Favorable 


° + 4 
Very respectable OHOUOUUUDUONOO 
“5 -& -3 -2 -l O 1 2 3 4& 5 
Now imagine that you were to see someone described as "Mediocre." If 
you thought this indicated a mildly unfavorable description, you would put 
your cross in the box labeled -l, as indicated in the following example. 


Most Most 
Unfavorable Neutral Favorable 


° + + 
ay OOOoNoooo0do 

-§ -&§ -3 2-1 0 1 2 3 5 

Throughout this booklet you will find words and phrases printed on the 

left side of the page. You are to imagine that, in your reading, they have 
been applied to a person and decide how strongly favorable or unfavorable a 
statement is meant. Then you are to put a cross in one of the boxes to in- 
dicate this degree. You will notice that all the words are applicable toa 
person but not necessarily in the same situations. They all carry, though, 
implications of favorableness or unfavorableness, and it is on this quality 
that you are to make your judgments. 


Fic. 1. Directions to the rating scale section. 
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TABLE 2 


StmmuL! ScaALED BY BotH PatrRED CoMPARI- 
SONS AND SUCCESSIVE INTERVALS 








Quite ordinary 
Slightly immoral 
Rather contemptible 
Very inferior 
Decidedly bad 


Extremely nice 
Unusually pleasant 
Pretty good 
Somewhat admirable 
Average 








same adverb could be used in successive items. 
The subjects rated the stimuli on an eleven- 
point scale, from most unfavorable through 
neutral to most favorable. The subject’s 
task was made more concrete by emphasizing 
in the directions that the stimuli were all to be 
rated in terms of how the subject would inter- 
pret them on reading and that they were all 
to be applied to people. The important parts 
of the directions, including example items, 
may be seen in Fig. 1. 

Two forms of the questionnaire were used. 
Their only difference was a simple permuta- 
tion of items, i.e., Items 1 through 102 of 
Form A were made Items 193 through 204 of 
Form B, and Items 103 through 204 of Form 
A were made Items 1 through 102 of Form B. 

Part II was a paired comparison schedule 
using a sample of ten of the combinations, re- 
sulting in 45 paired comparison judgments. 
The combinations used are listed in Table 2. 

The questionnaire was pretested on a group 
of secretaries, clerical assistants, and research 
assistants at Educational Testing Service and 
was found to meet satisfactorily the require- 
ments of comprehensibility and time limits. 


Subjects and Administration 


The subjects used were introductory psy- 
chology students at Wayne State University, 
Princeton University, and Dartmouth College. 
The administration of the questionnaire took 
place during a regular class period, and, except 
for one section of about 40 subjects at Wayne, 
the experimenter personally administered the 
questionnaire. Two hundred and eighteen 
subjects, about half of them men and the 
other half women, were tested at Wayne; 186, 
all male, at Princeton; and 133, all male, at 
Dartmouth. 

A few subjects finished the questionnaire in 
less than 20 minutes, about half in 35 minutes, 
and only one failed to complete it during the 
fifty-minute period. There were no evidences 
of lack of understanding of the task on the 
part of the subjects during the administration. 

The completed questionnaires were exam- 


ined for evidences of pattern marking and the 
presence of an appreciable number of unusual 
responses. If a questionnaire showed an 
obvious pattern such as all responses in a 
single category, simple alternation in the 
paired comparison section, pattern marking in 
the rating scale section, or multiple responses 
to a large proportion of items, it was elimi- 
nated from further analysis. Also, if asample 
page contained more than two unusual re- 
sponses such as rating “inferior’’ as highly 
favorable, the entire questionnaire was exam- 
ined for consistency. If such responses were 
consistent throughout for particular adjec- 
tives, the questionnaire was included in the 
analysis. If, on the other hand, the subject 
was not consistent in this rating, the question- 
naire was thrown out on the grounds that 
this indicated a lack of either understanding 
or cooperation. 

The paired comparison and the rating scale 
sections were examined separately, so that it 
was possible for a paired comparison section 
to be included for analysis but not the rating 
scale, and contrariwise. At Wayne, nine 
paired comparison and five rating scale sec- 
tions were rejected on the basis of one of the 
criteria; three of each at Princeton and four 
of each at Dartmouth were also rejected. 

The responses in the remaining 525 ques- 
tionnaires were then punched into IBM cards. 
An analysis of variance showed that differ- 
ences in the mean ratings of the same items on 
the two different forms could be attributed to 
differences in the individuals given the forms, 
so responses to Forms A and B were pooled. 
The data_from each school, however, were 
treated separately throughout the subsequent 
analysis. 


Methods of Deriving Psychophysical 
Scale Values 


The testing of the main hypothesis was to be 
done using successive intervals scale values 
derived from the rating scale items since it 
was felt that, in order to demonstrate a strong 
quantitative relationship such as multiplica- 
tion, it would be necessary to have numbers 
which as nearly as possible fit the axioms of 
the algebra of real numbers. In the more 
familiar classification of Stevens’ paper (1951, 
Ch. 1), we are trying to derive a ratio scale. 

As discussed by Gulliksen (1946), paired 
comparison scales derived using the law of 
comparative judgments are “‘distance’’ scales 
in the sense that the stimuli are ordered along 
a continuum and the distances between the 
stimuli are an additive scale. That is to say, 
given stimuli A, B, C, the stimuli have an 
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order such as A > B > C, and the distances 
between all pairs of stimuli are consistent. 
In this example, this consistency of distances 
would imply ‘ 


AB + BC = AC 


| The property of additivity of distances be- 
tween points is a necessary, although not a 
sufficient, property of a ratio scale. Accord- 
ingly, the use of the law of comparative judg- 
ments to arrive at scale values appeared de- 
sirable, assuming that subjects will measure 
distances between magnitudes in the same 
way they measure the magnitudes themselves. 
The method of paired comparisons, however, 
would have involved a completely impractical 
number of judgments with the number of 
stimuli necessary for the present study. For- 
tunately, the method of successive intervals 
has been found to result in scale values which 
are linearly related to comparative judgment 
scale values (Saffir, 1937), so this was decided 
upon as the scaling method to be used ; there- 
fore, the main part of the questionnaire was 
constructed as a rating scale. It should be 
noted that some method such as assigning 
scale values by simply averaging the ratings 
given by the subjects would not generally be 
satisfactory in deriving the equivalent of a 
comparative judgment scale. It would be so 
only if successive intervals scaling would result 
in intervals of equal size. This is not usually 
found to be even approximately true. 

The short paired comparison section of the 
questionnaire was included for the purpose of 
verifying that the comparative judgment and 
successive intervals scale values of the stimuli 
used in this study actually were linearly re- 
lated. 

In deriving the successive intervals scale 
values, the theory and procedure of Diede- 
rich, Messick, and Tucker (1957) were fol- 
lowed, utilizing the punched card procedure 
described by Messick, Tucker, and Garrison 
(1955), with slight modifications. The final 
iteration in the successive intervals solution 
was performed on the IBM Card Programmed 
Calculator at Princeton's Forrestal Research 
Center. The paired comparison scale values 
were hand calculated using Gulliksen’s in- 
complete data solution (1956a). 


The Matrix Solution for the Adverb 
and Adjective Values 


The method of fitting slopes and intercepts 
suggested by the linear relation of the scale 
values of the combinations to those of the 
unmodified adjectives furnishes an adequate 
test of the model, but it is quite sensitive to 


errors of measurement of the scale values of 
the unmodified adjectives. Consequently, 
a matrix solution which utilizes all the inter- 
relations of the scale values was derived. 

Insofar as the formulation is correct, the X 
matrix described earlier can be represented as 
the product of two matrices, C and S: 


X =CS 


If there are k adverbs, C will bea k + 1 by 2 
matrix ; its first column will contain the ‘“mul- 
tiplying values’ of the adverbs, including a 
value of unity in the first row as the “multiply- 
ing value” of the unmodified form of the ad- 
jectives, and its second column will be a con- 
stant value of unity. Correspondingly, if 
there are m adjectives, S will be a 2 by n 
matrix with the psychological or algebraic 
scale values of the adjectives as its first row 
and the constant K as the second. Given a 
data matrix X, approximations to C and S, 
Cand S, respectively, can be found. The de- 
gree to which Cand § reproduce X and have 
the hypothesized characteristics indicates the 
degree to which the data support the model. 
The determination of C and $ takes place 
as follows. Utilizing factor analytic tech- 
niques, rank-two matrices P and Q are deter- 
mined such that X = PQ within as close an 
approximation as possible. Then trans- 
formations T and 7— are found which give 
least-squares approximations to C and S, 
respectively : 
PT . 
T"0 =: 
X =CS 
In order to find T and JT, 72, the secona 


column of T, is computed, using the following 
formula :5 


1 
= ——— (P’P)-1P’ 
Ts = 7 (PP) P(A) 


where 
: 
an 
= (G2 — és2)? 
,_- es 
i-1 


and may be computed by the formula 


1 — ¢ =~ ([1)JP(P’P)“P’(1) 


5 In the formulae, the expression (1) means 
a column vector consisting entirely of 1’s and 
[1] means a row vector of the same type. 


They act as summators over columns and 
rows respectively. 
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The second row of T=, (T~)2 is computed 
using the similar expression 


(T), = is [130’(QQ")** 


in which 
n 
y 4 
~ (So; - $2;)? 
j=l 


d = 


n 

z $2; 

j=l 

and may be computed by means of the for- 
mula 


i-@= - [142’(QQ")*Q(1) 


The remaining elements of T and 7 can be 
found by utilizing the relationship between 
the elements of a transformation matrix and 
those of its inverse and the additional restric- 
tion that co, the “multiplying value’ when 
the adjective is unmodified, is unity. A 
proof of these formulae and a more detailed 
discussion of their properties is given in an 
unpublished manuscript by the present 
author.® 


RESULTS 
Scale Characteristics of the Data 


The first observation to become 
apparent from the data was the high 
degree of unanimity of the judgments. 
The common practice in scaling ex- 
periments is to give zero weights to 
normal deviates based on proportions 
less than .05 and greater than .95. 
In the present case this would have re- 
sulted in a very drastic reduction in 
the amount of usable information. 
Accordingly, proportion limits of .028 
and .972 for Wayne, .027 and .973 for 
Princeton, and .023 and .977 for Dart- 
mouth were set. The variation in 
these limits is due to the different Ns 
for the three samples; there was also 
some small variation in p within 
schools as N fluctuates slightly due tu 
omissions. These proportions corre- 
spond to Muller-Urban weights of 

6N. Cliff. 
into theoretical 
manuscript. 


Factoring rectangular matrices 
components. Unpublished 


about .13, roughly one-fifth of the 
maximum. 

Even with these somewhat liberal- 
ized limits the number of usable pro- 
portions was much smaller than the 
maximum possible. For the 204 
items and eleven categories of the suc- 
cessive intervals section, there could 
be 2040 usable proportions. The 
numbers which remained after apply- 
ing the limits described were: Wayne, 
845; Princeton, 678; Dartmouth, 760. 
Since 416 parameters were to be 
fitted for each school, this leaves 429, 
262, and 344 degrees of freedom, 
respectively. 

Similar consistency of judgments 
was found for the paired comparisons. 
Upon tabulating the data it was found 
that the comparison ‘‘average-unusu- 
ally pleasant’’ had been omitted 
through a clercial error, so that only 
44 different comparisons were made 
by the subjects. Of these 44, only 22 
fell within the proportion limits in the 
Wayne data, 19 in the Dartmouth 
data, and 15 in the Princeton data. 

Two earlier studies, Mosier’s (1941) 
and that of Jones and Thurstone 
(1955), had shown that it was possible 
to scale verbal material by successive 
intervals provided care was used in 
selecting unambiguous stimuli. The 
successive intervals model appeared 
to be adequate for the present data, 
but the computational method used 
does not furnish direct evidence of 
this. However, successive intervals 
scaling may be considered a simul- 
taneous normalization of the distribu- 
tion of responses to each stimulus. 
Since the first two moments of these 
response distributions are fitted to the 
data, the fit must be quite good except 
in cases where the distributions have 
more than one mode, have no distinct 
mode, or have skewnesses opposite in 
sign to those of other stimuli with 
about the same mean. Of these, bi- 
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TABLE 3 


ScALE VALUES OF CATEGORY BOUNDARIES 








Boundary 





7 


| 6 





Sample 
Wayne 
Princeton 
Dartmouth ‘ ‘ 1.4; 





4 
1.46 
1.48 


- 
| 


5 
1.87 
1.92 


| 
| oe | 


a 
6 
3. 59 





modality generally has the most seri- 
ous effect. The distributions of re- 
sponses to the stimuli used in the 
present study did not display any of 
these defects to any important extent. 

The scale values of the category 
boundaries are given in Table 3, 
where it can be seen that they were 
not found to be equally spaced. This 
indicates that positive verification of 
our hypothetical model, if observed, 
could not also have been found using 
simple averages of the ratings of the 
stimuli. 

Jones and Thurstone, in comparing 
their results to those of Mosier, ob- 
served that the distance between cate- 
gory boundaries may depend in part 
on the directions given the subjects. 
It is usually found in successive inter- 
vals scaling that the longest intervals 
are found at the ends of the scale and 
that they become shorter as the center 
is approached. In both Mosier’s 
study and the present one, one of the 
middle categories was also found to be 
long. Jones and Thurstone, in dis- 
cussing Mosier’s results, contended 
that this may be due to the directions 
which call attention to the fact that 
there are “favorable” and ‘‘unfavor- 
able’’ words and phrases and that 
these are to be rated on the right and 
left halves of the scale, respectively. 
Their own directions did not do this 
and they did not find a long middle 
category. This contention seems to 
be borne out here in that the direc- 
tions did call attention to the favor- 


ableness or unfavorableness of the 
combinations (see Fig. 1), and, as 
shown in Table 3, a middle category 
did turn out to be long, although in the 
Princeton group the long category was 
the fifth or —1 category rather than 
the neutral one. 

The fit of scale values to the paired 
comparison data was also close. 
Given in Table 4, along with the final 
scale values of the paired comparison~ 
stimuli, are the error terms E& for these 
stimuli, and the number of compari- 
sons falling within the proportion 
limits is given as W. The E is the 
standard error of estimating only 
those obtained normal deviates which 
fell within the prescribed limits from 
the derived scale values of the stimuli. 
The E are reasonably small. 


Reproducibility of the Data 


As mentioned earlier, fifteen of the 
combinations were included twice to 


TABLE 4 


SCALE VALUES OF PAIRED COMPARISON 
STIMULI 





Stimulus Wayne 





2.739 
2.087 
1.458) 
1.495 
.400} 
025 
—1,291 
—1.687 


Extremely nice 
Unusually pleasant 
Pretty good 
Somewhat admirable 
Average 

Quite ordinary 
Slightly immoral 
Rather contemptible | 
Very inferior | —1.767 
Decidedly bad —2.195 


Ww | 22 
| 





E -210 
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TABLE 5 


CORRELATIONS BETWEEN SCALE VALUES 
OF Sets OF 15 REPEATED ITEMS 


TABLE 7 


BETWEEN-GROUP INTERCORRELATIONS OF 
PAIRED COMPARISON SCALE VALUES 








| Error of Sub- 
| stitution 





.039 
O55 
041 


Wayne 
Princeton 
Dartmouth 








form a basis for estimating the relia- 
bility of the ratings. The members 
of each of these pairs were randomly 
put into two groups and the scale 
values correlated. The resulting co- 
efficients are given in Table 5, and, 
since all three are about .999, the reli- 
ability of the judgments is shown to 
be very high. The 150 scale values 
from each school which were to be 
used in subsequent analyses were also 
correlated among the three schools to 
get a measure of their comparability 
across the samples. From Table 6, 
they can be seen to range from .993 to 
.998, indicating the high degree of 
agreement to be expected among quite 
comparable groups of subjects. 
These coefficients, however, tend to 
be somewhat smaller than _ the 
reliabilities. 

The intercorrelations among’schools 
of the paired comparison scale values 
are given in Table 7. These are seen 
to be about .99 and quite comparable 
to the corresponding coefficients given 
in Table 6 for the successive intervals. 
Since the stimuli used for the paired 
comparisons were also included in the 


TABLE 6 


INTERCORRELATIONS OF 150 SCALE VALUES 
TO BE USED IN TESTING MULTIPLICATIVE 
COMBINATION 








i Princeton | Dartmouth 





.996 
.998 


Wayne 


.993 
Princeton 


| Princeton | Dartmouth 





991 
999 


Wayne 
Princeton 


| -989 





successive intervals questionnaire, it 
was possible to compare the values ob- 
tained by the two methods. The cor- 
relations thus obtained are given in 
Table 8, and they too are .99 or larger, 
indicating that similar processes were 
being used by the subjects by the two 
means and that there is an almost 
perfect linear relation between the 
successive intervals and paired com- 
parison scale values. 


Testing the Model 


The three sets of successive intervals 
scale values shown in Table 9 were 
those to be used to test the model. 
The reader may be interested to note 
by inspecting columns of the table 
that the effect of the adverbs on most 
of the adjectives is quite marked. 
The difference between the scale value 
of an adjective modified by “slightly” 
and the same adjective modified by 
“extremely” is usually almost one- 
fourth of the length of the entire scale. 
“Ordinary” and “average,” on the 
other hand, remain quite stable, al- 
though there is a noticeable tendency 
for them to move slightly toward the 


TABLE 8 
CORRELATIONS BETWEEN PAIRED 
COMPARISON AND SUCCESSIVE 
INTERVAL SCALE VALUES 








Group 





Wayne 
Princeton 
Dartmouth 
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ADVERBS AS MULTIPLIERS 


negative with the more extreme 
adverbs. 

A preliminary check on the model 
was made by plotting x9;, the scale 
values of the unmodified adjectives, 
against x,;, those of the adjectives in 
combination with some particular ad- 
verb. These were satisfactorily lin- 
ear, although there was a tendency 
for the scale values to bunch at the 
ends of the scale when the adjectives 
were paired with extreme adverbs. 

The next step in the analysis was 
the determination of the rank of X 
and finding the matrices P and Q. 
This was done by multiplying 


XX'=R 


and obtaining the two largest factors 
of R by an adaptation of the principal 
components method to obtain P. Q 
was computed by premultiplying X 
by the matrix of latent vectors of P’. 

The size of the first two latent roots 
(sum of squares of factor loadings) 
indicates the degree to which X can be 
approximated by the product of two 
rank-two matrices. It isa fundamen- 
tal theorem of factor analysis (Eckart 
and Young, 1936) that the sum of the 
diagonal elements of R, its ‘‘trace,” 
which is also the sum of the squares of 
elements of X, is the sum of the latent 
roots of R. These roots must all be 
positive or zero. Therefore, the sum 
of the first two roots can be compared 
to the trace to see how nearly the sum 
of squares of the x;; can be accounted 
for by the first two orthogonal com- 
ponents. Table 10 lists the two 
largest roots of each of the three 
matrices and the percentage of the 
trace accounted for by them. Since 
all of the percentages are greater than 
99.9, X can be very closely approxi- 
mated using only two factors in P and 
Q, as was hypothesized. 

The sum of squares of differences 
between theoretical and obtained xj; 


TABLE 10 


LATENT Roots or XX’ 








| Wayne Princeton | Dartmouth 





a? | 654.46 | 668.35 662.21 
a | 10.04 7.72 8.24 
Residual 51 .50 .50 





Trace 665.01 676.57 670.95 





a +a? 





.9992 


Trace 
! 


.9993 .9993 











is equal to the figure given in Table 
10 as the residual trace of the R mat- 
rices after extracting the two factors. 
The standard error of substitution of 
the theoretical for the obtained scale 
values of the 150 combinations is then 

50 

150 ~ .0577 


for Princeton and Dartmouth, and 


Tins 
150 ~ 


for Wayne. These values are seen 
to be only slightly larger than the cor- 
responding figures given in Table 5 
as the errors of substitution resulting 
from substituting the scale values of 
one set of repeated items for the other. 
Thus, there seems to be only a small 
amount of reliable variance which is 
not accounted for by the two factors. 

Next, the transformations T and 
T-' were determined for each sample 
and applied to the P and Q matrices 
to determine C and §. The @ and § 
matrices for each group are given in 
Tables 11, 12, and 13. 

The goodness of the fit of the data 
to the model is indicated by the de- 
gree of conformity of the obtained C 
and § matrices to the hypothesized 
ones. If the model is to be ideally 
verified, then not only must there be 
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TABLE 11 


Wayne Apvers VALUES Matrix C anp Apyective Vatues Matrix 5 








a 


$ 








Adverb_ 


Adjective Si K 





(Unmodified) 
Slightly 
Somewhat 
Rather 
Pretty 

Quite 
Decidedly 
Unusually 
Very 
Extremely 











2.082 
1.952 
1.746 
1.936 
1.617 
2.032 
2.008 
2.083 
2.121 
1.742 
1.752 
1.835 
2.136 
2.001 
2.173 


— 1.246 
— 1.158 
—.913 
—1.177 
— .806 
—1.025 
—.813 
—.078 
—.040 
1.007 
1.078 
1.001 
802 
.983 
836 


Evil 

Wicked 
Contemptible 
Immoral 
Disgusting 
Bad 
Inferior 
Ordinary 
Average | 





Nice 

Good 
Pleasant 
Charming 
Admirable 
Lovable 








only two factors, but the second col- 
umn of @ must contain all unities and 
the second row of § must be a constant 


value K, which is to be remembered to 
be the difference between the arbitrary 
zero point of the scale values and the 
“true”’ psychological zero point. De- 


partures from these constant values 
indicate the degree to which the data 
do not confirm the model. Examina- 
tion of the tables shows that the fit 
was excellent for the adverb matrices, 
but that for the adjectives, while 
good, was noticeably less exact. The 


TABLE 12 


PrinceTON ApverB VaLues Matrix C anp Apjective VALUES Matrix 3” 








é 


& 





Adverb 


| 


Adjective 





(Unmodified) 
Slightly 
Somewhat 
Rather 
Pretty 

Quite 
Decidedly 
Unusually 
Very 
Extremely 











Evil 
Wicked 
Contemptible 
Immoral 
Disgusting 
Bad 

Inferior 
Ordinary 
Average 

Nice 

Good 
Pleasant 
Charming 
Admirable 
Lovable 
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TABLE 15 


DartMoutTH ApveRB VALUES Matrix C anp ApjectivE VALUES Matrix 3’ 








A 


c 


? 





Adverbs 


Adjective sf K 





(Unmodified) 
Slightly 
Somewhat 
Rather 
Pretty 

Quite 
Decidedly 
Unusually 
Very 
Extremely 











— .993 
—.997 
— .882 
—.954 
—.902 
—.796 
— .861 
—.223 
—.211 
1.011 
1.075 
974 
.910 
1.086 
812 


1.972 
1.910 
1.792 
1.910 
1.715 
1.907 
2.037 
2.182 
2.195 
1.739 
1.761 
1.860 
2.013 
1.892 
2.207 


Evil 
Wicked 
Contemptible 
Immoral 
Disgusting 
Bad 
Inferior 
Ordinary 
Average 
Nice 

Good 
Pleasant 
Charming 
Admirable 
Lovable 











standard deviation from the theoreti- 
cally constant values of 1.000 for the 
second column of @ are of the order 


.01, while the standard deviations of 
the theoretically constant K in the 
second column of the S’ tables are of 
the order .16 (for convenience, § is 
presented as §’ in the tables). All 
three groups gave highly equivalent 
results. 

The multiplying values of the ad- 
verbs are given in the first rows of the 

matrices. They are seen to vary 
from about .5 to 1.5. The actual 
values obtained from the three sets of 
data vary somewhat, but in general 
are quite comparable. In compari- 
son to the unmodified form of the ad- 
jectives, combinations with “slightly” 
and “somewhat” have the smallest 
intensities, the multiplying values for 
these adverbs being considerably less 
than unity. Combinations with 
“rather” and “pretty’’ are also less 
extreme than the unmodified form but 
are much nearer to it; “‘quite’’ makes 
adjectives just slightly more extreme, 
while ‘‘decidedly”’ has a definite effect ; 


“very” and “unusually” are close to- 
gether and stronger than ‘‘decidedly” ; 
“extremely” is by far the most effec- 
tive of the intensives included in the 
list. The values obtained in the ex- 
periment seem to agree with subjective 
impressions of how they are used. 

The s; values derived for the ad- 
jectives correspond closely to expecta- 
tion. Those for the seven adjectives 
which could be termed definitely un- 
favorable are relatively large negative 
numbers, while those for the six favor- 
able adjectives are large positive num- 
bers about equal in absolute value to 
the s; of the unfavorable words. 
“Ordinary” and “average” have s; 
near zero but on the negative side. 
This reflects the stability observed in 
their scale values and, as observed 
earlier, perhaps offers an explanation 
of why these adjectives sound peculiar 
when modified. : 

In one respect, however, the results 
for the adjectives cannot be consid- 
ered quite as neat as those for the ad- 
verbs: the theoretically constant sec- 
ond columns of the §’ matrices (the K 
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values) are somewhat variable. The 
departures from a theoretically con- 
stant value might at first thought be 
dismissed as random errors of meas- 
urement, but two things in addition to 
their magnitude argue against this. 
First, the entries in the K columns of 
the §’ show some consistency across 
the three groups. Note, for example, 
that the K for “disgusting” is the 
smallest in all three cases; also, ‘‘or- 
dinary’’ and “average’’ have among 
the highest entries in each of the three 
groups. The correlations of these 
values over adjectives among the 
three groups bear this out. As can 
be seen from Table 14, they are ap- 
preciable. Such high correlations 
would hardly be expected if the vari- 
ation in K were a random error of 
measurement. 

The lack of correspondence between 
data and theory in respect to the K 
values is also shown in comparing the 


s; entries of the 5” matrices to the 
scale values of the unmodified ad- 
jectives given in the first rows of the X 
matrices. According to the theory, 
these scale values should be simply 
s; + K within, say, an error compar- 
able to that indicated by the reliabil- 


ity measure. Examination of the two 
sets of numbers shows that this will 
not be the case, for, if we subtract the 
mean of the K from the xo;, the result- 
ing numbers differ somewhat from the 
corresponding sj. 

The fairly small but consistent 
variation observed in the K is felt by 


TABLE 14 


INTERCORRELATIONS OF VALUES OF K 
OBTAINED FOR ADJECTIVES 








Princeton | Dartmouth 


.890 


Wayne 
Princeton 


846 
.896 








the author to be the only important 
departure from the model and requires 
some attempt at explanation. The 
possibility of a computational artifact 
can be ruled out. The matrix solu- 
tion fits the data to the model in the 
least-squares sense, so that the vari- 
ance of the Ks has been made as small 
as possible. 

Several possible explanations of the 
variation present themselves. First, 
it may be that it is an artifact of the 
scaling task itself or perhaps even of 
the method of successive intervals. 
Detailed examination of the data re- 
vealed that, for the most part, the 
lack of fit results from failure of the 
extreme combinations to be pushed 
out as far as one would expect from 
the model. This could result from the 
subjects rating relatively moderate 
combinations near the ends of the 
scale and then not having any means 
of rating the really extreme ones any 
farther out. A more flexible means of 
securing the judgments than using 
printed questionnaires should remedy 
this. Alternatively, the successive 
intervals solution used, while it has 
many advantages, may be sensitive to 
the fact that the judgments used here 
were very consistent over subjects, so 
that the data were highly ‘“incom- 
plete.” If this resulted in underesti- 
mating the discriminal dispersions of 
the extreme combinations, then the 
result could be the pulling in of the 
ends of the scale as was observed. 
The use of the method on artificial 
data having the required characteris- 
tics is required to investigate this 
possibility, however. 

Another possible explanation lies 
in the fact that the model used here is 
based on the properties of Euclidean 
vector spaces. It may be that the 
properties of the space with which we 
are dealing here are only approxi- 
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mately those of the Euclidean type. 
This possibility would introduce many 
complications, both theoretical and 
practical, however. 

Perhaps the simplest explanation of 
the variation in the K is that, con- 
trary to assumption, each of the ad- 
jectives has its own zero point rather 
than there being a common one for 
all the adjectives. In this case the 
basic formula would become 


Xiz = C8; + K; 


Here K; represents the distance from 
the arbitrary zero point to the psy- 
chological zero point of the adjective. 
This explanation has the advantage 
of fitting the available data without 
requiring further experimentation, but 
the author, at least, is reluctant to 
settle on it at this point because it 
detracts from the simplicity of the 
model. 


CONCLUSIONS AND IMPLICATIONS 


The data and discussion of the pre- 
vious section were presented as argu- 
ments in favor of the hypothesis that 
adverbs and adjectives of specifiable 
types combine according to a multi- 
plicative rule. The consistent dis- 
crepancies between data and theory 
are felt to be small enough to warrant 
the acceptance of the hypothesis, at 
least as a good approximation. 

Using the results of this study as a 
base, there are further lines of research 
which would seem to bear promise of 
broadening the applicability of the 
hypothesis. The first of these is the 
expansion of it to other dimensions. 
If one were to define an abstract di- 
mension of communicated size, would 
“very small” be ‘very’ times 
“small’’? The subjective impression 
of the way the words combine is the 
same as in the evaluative dimension: 
“very small” is smaller than “‘small’”’ 


and “very large” is larger than 
“large,” and so on. This particular 
example is especially interesting be- 
cause of the implication that there are 
positive and negative quantities of 
size whereas, physically, size is a 
positive quantity. It may be that the 
intensive adverbs would have the 
same multiplying value in all dimen- 
sions. This is intuitively the most 
satisfactory way of conceptualizing 
the relationship, but one might find 
that they would have different multi- 
plying values in different dimensions 
even though their action were multi- 
plicative. 

Investigation might also be fruitful 
into the application of the multiplica- 
tive rule to combinations other than 
those containing the common intens- 
ive adverbs. Brunot (1922), in his 


attempt to redefine the grammatical 
classifications of French, speaks of a 
general class of expressions of degree. 


It might be that the English equival- 
ents of most of all of the expressions 
he includes there would act in this 
way. It is, of course, very important 
to extend the findings for English to 
other languages. While it may be 
that the rules of combination might 
be different for languages with a 
structure other than that of English, 
some similar type of quantitative ab- 
straction might easily hold.’ 
Combinatory rules other than the 
one forming the basis of this investiga- 
tion can perhaps also be found for 
other types of combinations in Eng- 
lihs. For instance, adjective-noun 
combinations usually have characteris- 
tics of both the adjective and the 
noun, so the rule for them might be 
some form of addition. A combina- 


7 Harold Gulliksen, of Princeton University, 
is currently gathering data from several Euro- 
pean countries to see whether translations of 
the words used here combine in the same way 
in other languages. 
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tion of adjectives might also act ia 
this way. 

Extending the search for combin- 
atory rules to various dimensions, 
languages, and parts of speech, then, 
seems from the results of this study to 
hold out a hope for a fairly thorough 
formal representation of how the emo- 
tive aspects of language are communi- 
cated. Finding such rules requires 
fairly exact formulation of them be- 
fore undertaking experimentation. 
Also, it is necessary to have measuring 
instruments which give data accurate 
enough to show whether or not the 
theories hold, and the nature of de- 
partures’from the theory when it is 
shown to be a good general statement. 
The method of successive intervals 
seems from the present results to be 
such a method. 

The results of this study also have 
relevance to the general problem of 
psychological measurement. In gen- 
eral, measurement may be said to be 
the process of relating observables to 
some ordering system. The field of 
real numbers is considered the ideal 
ordering system, since it is possible to 
perform operations on it and show re- 
lations with it which are not possible 
using other systems. However, it is 
usually not possible to justify treating 
measured quantities as completely iso- 
morphic with the real numbers; there- 
fore, various of the real number axioms 
are relaxed or termed inadmissible in 
order to prevent the appearance of 
nonsensical theoretical results in con- 
structing scales. This lack of iso- 


5 Analysis of the filler items in the question- 
naire, accomplished since the preparation of 
this manuscript, indicates strongly that ‘ver y, 
very” is “very-squared.” Combinations 
which contain two different adverbs, e.g., 
“very slightly admirable,”’ seem to operate by 
having the two adverbs combine exponenti- 
ally: “slightly” to the “very” power, but data 
on the latter rule are not clear-cut. 


morphism has been especially true of 
psychological data, so types of scales 
are differentiated on the basis of 
which of the axioms are altered or 
relaxed. The distinctions made by 
Stevens (1951, Ch. I) and elaborated 
in some respects by Coombs (1951) 
are examples of the recognition of such 
disparities. As a result, certain oper- 
ations such as addition or division are 
termed inadmissible, weak ordering is 
substituted for strong, and so on. 
Weitzenhoffer (1951) presents a co- 
gent discussion of the algebraic axioms 
in relation to psychological data. 

it is felt that the data quantities of 
this study represent an unusually close 
approximation to the real numbers, 
for the operation of multiplication 
requires both a zero point and a unit 
of measurement. Ina very real sense, 
“extremely good” may be said to be 
about one-and-a-half times as good as 
“good.” The multiplication seems 
to be of the scalar type since the ad- 
jectives represent the same kind of 
meaning whether modified or not. 
Note that this might not have been 
found to be the case had more “‘mean- 
ingful” adverbs such as “ridicul- 
ously,” “‘sinisterly,”’ or ‘‘constantly” 
been used. 

The zero point on the scale seems, 
for practical purposes, to be slightly 
above” the adjective ‘‘average” for 
these” groups. A dimensional unit, 
analogous to foot or hour, may be set 
at any convenient point, whereupon 
all the scale values, representing de- 
grees of favorableness or unfavorable- 
ness, are to be expressed”as multiples 
of this point. 

This discussion of zero points and 
units may be premature, especially in 
view of the possibility that each ad- 
jective has its own scale, but the evi- 
dence presented here makes it seem 
not unreasonable. If further experi- 
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mental work verifies the present 
findings in the most important aspects 
but shows that each adjective does 
indeed have its own zero point, then 
the discussion of zero points and scale 
units would have to be made specific 
to the adjective. Perhaps the adjec- 
tive “modified” by the prefix non- 
or a- (not the negatives un- or im-) 
would then be the zero point and the 
dimensional units would have to be 
expressed in terms of good-ness, 
immoral-ness, and so on. 


SUMMARY 


This study set out to test the hy- 
pothesis that the common adverbs of 
degree multiply the intensity of the 
adjectives they modify. That is, 
there is a number associated with each 
adjective and with each adverb; the 
intensity of each combination is the 


product of the numbers associated 


with the words. It was assumed that 
the relationship should be reflected by 
the psychophysical scale values of ad- 
verb-adjective combinations. Ac- 
cordingly, three groups of subjects 
rated the combinations of nine inten- 
sive adverbs with 15 evaluative adjec- 
tives on the favorable-unfavorable 
dimension. Scale values of the stimuli 
were determined by the method of 
successive intervals. 

The scale values obtained were 
found to be highly reliable, highly com- 
parable between groups, and highly 
correlated with scale values obtained 
by paired comparisons. A matrix 
method was employed to test the hy- 
pothesis of multiplicative combination 
and to determine the adverb and ad- 
jective values. The degree of cor- 
respondence between hypothesis and 
data was found to be very close in all 
three groups. This was evidenced 
by the fact that matrices of scale 
values could be reproduced with a 


high degree of precision using only 
two factors, as was hypothesized, and 
that certain hypothetically constant 
values derived from the data were 
very nearly so. Some consistent dis- 
crepancies between hypothesis and 
data were pointed out. 

The relationship discovered is akin 
to the scalar multiplication of vector 
algebra. A zero point is implied for 
the numbers associated with the ad- 
verbs and those associated with the 
adjectives. Scale units analogous to 
physical units are implied for the ad- 
jective numbers, but the adverb num- 
bers are “unitless’”’ scalars. Several 
extensions of this research are sug- 
gested. 
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In 1861, Paul Broca described two 
cases of loss of speech from cerebral 
injury and presented the autopsy find- 
ings before the Anatomical Society of 
Paris. He labeled both cases aphemia, 
which he defined as loss of the faculty 
of articulated speech without impair- 
ment of comprehension or loss of 
intelligence. However, neither the 
medical histories nor the findings 


Broca reported indicate he was dealing 
with such circumscribed impairment. 
Although both brains were extensively 
damaged, as Marie pointed out in his 
critical paper of 1906, Broca ascribed 


loss of speech to lesions in the second 
and third frontal convolutions. 

Nine years later the researches of 
Fritsch and Hitzig in Germany, and 
of Ferrier in England, established the 
existence of a motor cortex in the 
frontal lobe of various mammals (Ful- 
ton, 1938). In 1874, Heschl traced 
the auditory radiations to the temporal 
lobe (Fulton, 1938), and Wernicke 
(1908) described sensory aphasia, 
which he considered to result from 


*This research was done during the 
tenure of a Social Science Research Council 
Faculty Fellowship of the second author. 
Much of the computational work was per- 
formed by Joanne D’Andrea as a research 
assistant employed by funds from a grant- 
in-aid for Behavioral Science Research from 
the Ford Foundation. The administration 
of the aphasia test battery was performed 
by the senior author, Virginia Carroll, 
Barbara Street and Rudy Simmons, members 
of the staff of the Aphasia Clinic of the 
Neurology Section of the Minneapolis VA 
Hospital. 


temporal lobe lesions. During the 
same period, Munk and Schaefer 
identified the visual cortex in the oc- 
cipital lobe (Fulton, 1938). 

It was then reasoned that there was 
in the frontal lobe a center which 
contained cells in which were stored 
movements used in speaking, and 
another center in which were stored 
movements used in writing. In the 
temporal lobe were cells containing 
images of words heard, and in the 
parieto-occipital cortex were cells con- 
taining images of printed and written 
words. All of these centers were 
connected by fiber tracts. Thus it was 
possible to have a motor or a sensory 
aphasia resulting from destruction of a 
center (central aphasia), or a motor 
or a sensory aphasia resulting from 
destruction of a fiber tract (conduc- 
tion aphasia). 

Bastian (1887), following this tradi- 
tion, believed that damage to the 
auditory word-center resulted in pure 
word deafness, and damage to the 
visual word-center in pure word blind- 
ness. Damage of the glosso-kinaesthetic 
center produced pure aphasia (Broca’s) 
and of the cheirokinaesthetic center, 
agraphia. He described the following 
defects resulting from damage to the 
“commissural fibers” connecting sen- 
sory areas: 


1. Visual-auditory commissures: im- 
pairment of ability to name objects 
and read aloud. 

2. Auditory-visual commissures: im- 
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pairment of ability to write spoxtane- 
ously and to dictation. 

3. Audio-kinaesthetic commissures: 
Broca’s aphasia, indistinguishable from 
that caused by damage of Broca’s area. 

4. Visual cheirokinaesthetic com- 
missures: impairment of ability to 
visualize forms. 

5. Audito-kinaesthetic commissures: 


impairment of ability to write to 


dictation. 


Bastian’s schema may be taken as 
an example of the many classifications 
constructed by the diagram-makers, 
from Wernicke to the present day. 
They were based on the theory of a 
cortical mosaic, with discrete mental 
faculties residing in the cells of specific 
areas. Although some of the diagrams 
were simpler and others more complex, 
the procedures were the saine: the 
diagram was constructed, and the 
symptoms which should result from 
lesions at various sites were deduced. 


Unquestionably the symptoms described 
were observed, but others were inevi- 
tably overlooked, for there was no 
systematic or comprehensive examina- 


tion of patients. A later neurology 
substituted terms such as functions 
represented for faculties contained in 
cells, but, as Gooddy (1956) has 
pointed out, without any real altera- 
tion of the basic concept of cortical 
function. All the schemata presumed 
a motor and a sensory dichotomy, and 
pure aphasias of various forms deter- 
mined by the anatomical sites of the 
lesions. It is interesting to note that 
textbooks, both in neurology and 
general psychology, present aphasia 
in about this fashion even today 
(Alpers, 1954; Brain, 1947; Grinker 
& Bucy, 1949; Morgan, 1956; Munn, 
1951). 

It was not until 1926 that Henry 
Head, a pupil of Hughlings Jackson, 
constructed a series of tests for aphasia 


and systematically administered them 
to 26 aphasic patients with head injuries 
resulting from gunshot wounds incurred 
in World War I. Intensive clinical 
study of these patients led him to 
consider aphasia a defect of symbolic 
formulation and expression. He rec- 
ognized four clinical forms of the 
disorder : 


1. Verbal aphasia, resulting from 
lesions of the pre- and postcentral 
convolutions: characterized by loss of 
articulated speech, with comprehension 
impaired but recovering rapidly. 

2. Syntactical aphasia, resulting from 
a lesion of the upper temporal con- 
volution: characterized by jargon, 
slurred speech, impairment of rhythm 
and phrasal memory. 

3. Nominal aphasia, resulting from 
lesions in the region of the angular 
gyrus: characterized by loss of power 
to name and want of comprehension of 
the meaning of words. 

4. Semantic aphasia, localized in the 
supramarginal gyrus: characterized 
by disturbance of comprehension of 
significance of words and phrases as a 
whole. 


It is significant that with careful 
study of even a small number of 
patients, the traditional dichotomy be- 
tween motor and sensory aphasia began 
to disappear. Head’s work was limited 
by the fact that only 26 patients were 
studied and that little was generally 
known of objective mental measure- 
ment at this time. His unique method 
of localization cannot be taken seriously. 

In 1935 Weisenberg and McBride 
published a study of 60 aphasic 
patients. This was a more adequate 
study in that the number of subjects 
was larger, controls were used, and 
standardized measurements employed. 
An average of fifteen hours was spent 
examining each patient. The study 
offered substantial evidence that both 
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receptive and expressive processes 
were always disturbed in aphasia, and 
isolated disturbances of single modali- 
ties (agraphia, alexia, acalculia, etc.) 
were not found clinically. Weisen- 
berger and McBride classified the 
patients they observed as follows: 


1. Predominantly expressive. Dis- 
turbances varied from slight defects to 
almost complete loss of expression. 
Receptive processes were impaired, 
although less severely than expressive. 

2. Predominantly receptive. Im- 
pairment of understanding varied from 
slight to severe, but there was never 
an absolute loss. Jargon and para- 
phasia were present, and writing re- 
flected defects found in speech. 

3. Amnesic. Fundamental difficulty 
was evoking words. Receptive proc- 
esses were relatively satisfactory. 

4. Expressive-receptive. Very severe 
limitations were present in all language 
performances. 


Although this study is a landmark 
in the history of aphasia, the system 
of classification adds little to its sig- 


nificance. If both receptive and ex- 
pressive impairment are always present, 
the words become wastebasket terms 
in the context of aphasia. The authors 
limited themselves to a broad general 
classification, which avoided the naivete 
of the early diagram-makers, and some 
of the questionable linguistic assump- 
tions of Head. It was beyond the scope 
of their study to investigate relation- 
ships between observed symptoms, or 
the order of changes in recovery, with- 
out which a reliable classification 
system cannot be established. For 
example, amnesic or nominal defects 
do not show in patients who have no 
speech, but appear as language appears. 
They diminish and disappear in later 
recovery stages, leaving, first syntactical 
defects, and finally a picture much like 


the one Head described for semantic 
aphasia. 

Osgood (1953) has presented a 
psycholinguistic model for aphasia, 
which although more sophisticated, 
repeats the methodology of the nine- 
teenth century diagram makers: the 
model was constructed and deficits 
which should result from interruptions 
of psycholinguistic rather than neuro- 
logical processes were deduced. It 
remains to be determined whether or 
not breakdowns at predicted levels of 
encoding, association, and decoding 
processes can be discriminated clinically 
or shown empirically. If adequate data 
are obtained and found to fit the pro- 
posed conceptual model, two important 
questions still remain. The first is 
whether it may not be premature to 
make inferences regarding levels, and 
relationships between processes at 
specified levels of cerebral function, 
in view of the present limited knowledge 
of neurophysiological organization and 
mechanisms. The second is whether 
or not analyzing aphasic defects in 
terms of disorders of encoding, associa- 
tion, and decoding processes occurring 
at specified levels achieves simplifica- 
tion or clarification of observed symp- 
toms, or has predictive value. On the 
other hand, it is possible that such a 
model, empirically tested, may lead to 
important new insights concerning 
aphasia. 

Goodglass and Mayer (1957) inves- 
tigated the hypothesis, advanced by 
Jakobson and Halle (1956), that there 
are two independent processes oper- 
ating in normal speech, the first rep- 
resented by the use of words to 
symbolize concepts, and the second by 
use of the structural forms of con- 
nected speech. The authors considered 
their results to show that the two 
processes could be differentially im- 
paired in aphasia, and so supported 
Jakobson’s proposal. This conclusion 
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cannot be accepted, however, in the 
absence of careful longitudinal studies. 
Clinical records and experience strongly 
suggest that “non-agrammatic” pa- 
tients often present a more severe 
language disturbance than agrammatic 
patients; that as recovery occurs they 
approximate the agrammatic group; 
that all degrees of difficulty using words 
as names for objects, events, and 
concepts are found in the agrammatic 
group, as well as in the non-agrammatic 
one; and that agrammatic patients do 
employ structural forms. 

In 1948 the establishment of an 
aphasia center at the Minneapolis VA 
Hospital made it possible to study 
large series of patients over extended 
periods of time. Since no method of 


examination existed which was com- 
prehensive enough to yield sufficient 
information or sensitive enough to 
measure changes which occurred with 
recovery, work was begun immediately 


on constructing a test. No assump- 
tions were made about the nature of 
aphasia, and no terms were used which 
were not operationally defined. The 
test was revised from year to year in 
order to eliminate artifacts and explore 
areas considered to need more inves- 
tigation as new insights were gained. 
The Minnesota Test for Differential 
Diagnosis of Aphasia (Schuell, 1955b) 
is now in its sixth revision and is still 
in experimental stage. In Minneapolis 
more than 440 aphasic patients have 
been tested, as well as a series of 
40 patients with no _ neurological 
involvement. 

At the present time a classification 
system is used which is based upon 
analyses of test patterns of 117 patients 
tested on Forms 3, 4, and 5 of the 
Minnesota Test (Schuell: 1955a, 
1955b, 1957). The diagnostic criteria 
and accompanying prognosis for each 
group may be summarized as follows: 


Group I: Severe impairment of all 
language modalities. Patients in this 
group had no functional speech, read- 
ing, or writing, and all made errors 
pointing to common objects named by 
the examiner. None of them acquired 
functional speech, although they usually 
learned to repeat and to copy, and 
some reactive speech appeared. 

Group II: Impairment of auditory 
processes. Auditory recognition (de- 
fined as ability to point to common 
objects named by the examiner) was 
intact, but auditory retention span and 
auditory recall were impaired; this 
impairment was reflected in defective 
speech, reading, and writing. Errors 
in all modalities correlated with the 
length of stimulus presented or re- 
sponse demanded. Speech, reading, and 
writing improved simultaneously when 
intensive auditory stimulation was 
given, and these patients made excellent 
recovery. 

Group III: Impairment of auditory 
processes with coexisting visual in- 
volvement. Patients in Group III were 
like those in Group II, but in addition 
they showed specific impairment of 
visual recognition and visual recall. 
They confused letters and words with 
similar visual configurations, and 
frequently had difficulty following the 
line and keeping the place. They 
sometimes complained of blurring. Re- 
versals and distortions of letters ap- 
peared on writing, and substitution 
of letters which looked alike, such as 
bdpq, hnur, ft, wm, FE, and PBR, and 
in script ei, el, gyq, bf, and wu. Oral 
spelling tended to exceed written spell- 
ing. All Group III patients had visual 
field defects, but all patients with field 
cuts did not show these symptoms. 
Group III patients recovered speech 
well, but reading and writing improved 
more slowly. Rate remained retarded 
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and small inaccuracies tended to 
persist. 

Group IV: Impairment of auditory 
processes with coexisting sensorimotor 
involvement. These patients did not 
show impairment of auditory recogni- 
tion, but impairment of auditory reten- 
tion span and recall was usually severe. 
They made consistent articulation 
errors, and speech tended to be hesitant 
and laborious as it was acquired. They 
often appeared not to know where the 
tongue was in the mouth, or how to 
move it to a desired position. They 
had difficulty articulating consonants 
requiring complex coordinations and 
consonant blends. Fluency was ac- 
quired slowly. Speech was functional 
as it was acquired, but remained 
limited and defective over long periods. 

Group V: Scattered auditory, visual, 
and motor findings. The severity of 
involvement of each process varied 
from patient to patient, producing 


superficially different clinical pictures. 
Cranial nerve involvement was usually 
present. A good deal of language was 
often retained, in one modality or 


another. In all other groups age and 
etiology were heterogeneous, but all 
Group V patients were over 50, and 
75 per cent were over 60 years of age. 
Eighty-three per cent were hyper- 
tensives who had incurred more than 
one clinical episode; seven per cent 
were arteriosclerotic, and seven per 
cent had incurred severe head injuries. 
It was of course only chance that the 
latter group contained no younger 
patients. Most of these patients were 
emotionally labile, and they tended not 
to be capable of sustained or self- 
directed effort. They were more con- 
cerned about physical problems, such 
as dizziness, headache, and dyspnoea 
than about speech. They usually 
fatigued more readily than other 
aphasic patients. Functional speech 


could be increased or made more 
intelligible in many cases, but only 
limited goals were achieved. 

It is estimated that about 90 per 
cent of the patients who have been 
studied have shown one of these easily 
identifiable clinical patterns. Some 
patients have, of course, been atypical. 
Other patterns which have occurred 
less frequently in the populations 
studied have been recognized clinically, 
but cannot be discriminated by existing 
tests. 

It should be pointed out that this 
classification system is dependent upon 
obtained patterns of impairment rather 
than levels of severity. In all groups 
except the first, patients of mild, 
moderate, and severe aphasic impair- 
ment are found. A system which de- 
pends on pattern rather than level of 
severity has more stability, since after 
physiological conditions have stabilized, 
the pattern can be identified whether 
the patient is seen early or late in the 
recovery period. A practical advantage 
of the classification system is its high 
predictive value, since all new evidence 
obtained has tended to support the 
prognoses initially reported for each 
group (Schuell: 1955a, 1955b, 1956, 
1957). 

It should be noted that the classifica- 
tion system, as it developed from test 
patterns and clinical experience, did 
not lead to identification of types of 
aphasia based on different kinds of 
language deficits. All patients showed 
reduction of auditory retention span, 
impaired word-finding, and impaired 
language formulation. 

Group I patients, who showed the 
most severe losses of all language 
functions, could be taught to repeat 
and to copy, and to produce as many 
as a hundred different words in one 
clinical period by being supplied a 
well-established association, such as 
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bread and , meat and ——, table 
and , you sitina———. For these 
patients, no matter how intensive the 
stimulation and reinforcement, recall 
never became voluntary, so speech re- 
mained nonfunctional. 

For all, other patients, auditory 
retention span, available vocabulary, 
and use of fhe forms and structures of 
connected speech increased simultane- 
ously as recovery occurred, and the 
increase was reflected in reading and 
writing, as well as in speech. 

Some of the superimposed defects, 
such as a paralyzed palate or severe 
visual limitations, were irreversible. 
Others responded well to treatment, 
and the residual disability became less 
and less obvious. Group III patients, 
for example, relearned the visual forms 
of symbols letter by lettter, lower case, 
upper case, printing and script, and 
reading and writing approximated their 
former quality. Rate remained slow, 
however, and tell-tale errors appeared 


when patients were tested to limits 
and large enough samples of perform- 


ances were obtained. Group IV 
patients learned to repeat, and speech 
sounded normally fluent in most of the 
customary interchanges of every day, 
but motor disintegrations could be 
observed when more difficult tasks 
were presented, and in situations of 
stress. 

The upper limit of recovery for 
patients who have experienced a per- 
sisting aphasia is not known. 

Many Group IV patients, when 
initially seen, appeared as severely 
impaired as Group I patients, except 
they did not make errors pointing to 
common objects named by the exam- 
iner. The significance of this small 
difference in pattern is that it enables 
one to predict that one patient will be 
able to use all the language skills he 
regains in a functional manner ; another 
will not be able to go beyond repeat- 


ing, copying, and occasional reactive 
responses. 


PRoBLEM 


Neither the history of hypothesized 
brain-area-speech-function _relation- 
ships nor empirical findings regarding 
types of aphasia appeared to justify 
topological schemes and descriptions. 
On the contrary, the language behavior 
of patients who eventually recovered 
speech seemed to indicate that reported 
aphasic types were largely classifica- 
tions of different amounts of language 
deficit, or classifications of patients at 
different stages of recovery. Such 
clinical impressions, the pragmatic 
success of the classification system, 
and casual inspection of the over-all 
order of difficulty of different tests in 
the Minnesota battery suggested 
strongly that systematic investigation 
of the nature of this language disorder 
might well begin with the most 
elementary hypothesis possible: that 
all aphasias are part of a general 
hierarchy of language deficit. 

This idea is not new. Hughlings 
Jackson considered aphasia impairment 
of propositional speech. He pointed 
out in 1866 that the aphasic patient had 
not lost words, but the ability to use 
words to express meaningful relation- 
ships. Head (1926), influenced by 
Jackson, defined aphasia as impairment 
of symbolic formulation and expres- 
sion, although he still classified patients 
in groups which essentially followed 
old principles of localization. 

It is apparent, of course, that other 
deficits are present in many cases of 
aphasia. Visual and motor deficits, 
hearing loss, and defects of sensation 
are to be expected as frequent con- 
comitants of brain damage. However, 
it was hypothesized that insofar as the 
effects of such damage could be 
screened out or minimized in the tests, 
the language deficit itself would be 
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found to be part of a general hierarchy 
of language behavior, differing only in 
amount (level of hierarchy) from one 
patient to another. 

The clinical justification for this 
hypothesis is as follows. In any 
adequate sample of an aphasic popula- 
tion, patients are found who have 
difficulty executing the movements 
required for speech, and other patients 
who have no such difficulty. Some 
patients are found who have difficulty 
recognizing, reproducing, and recalling 
visual configurations, and others who 
perform normally on tasks of this kind. 
Some patients are disoriented in space, 
and others are not. But whether such 


findings are present or absent, all 
aphasic patients tend to show a re- 
duction of the amount of available 
language which cuts across modalities 
and, if they improve, a certain consist- 
ency of recovery pattern. The present 
hypothesis does not deny the existence 


of various concomitants of brain 
damage which would be expected to 
constitute identifiable factors present 
in aphasic as in other brain-damaged 
populations; it is concerned, however, 
with the nature of the deficit which 
appears to be relatively independent 
of such factors, since it occurs in their 
presence and in their absence. 

Actually, of course, there is no such 
thing as a pure test, or an uncompli- 
cated process, for that matter. Per- 
formance on any test item involves 
present and past states of cortical 
activity and a complexity of incoming 
stimuli, before a response of any kind 
can be organized and executed. Never- 
theless, responses can be compared, 
deviations studied, and useful dimen- 
sions extracted. 


METHOD 


It was decided to use Guttman’s scale 
analysis (1944, 1947, 1950) in order to test 
for the existence of a single hierarchy of 
language functions. If a sizeable number 


of language tests were found to be scalable 
over a wide range of test content and test 
difficulty, and over a heterogeneous collection 
of aphasic patients, this could be considered 
as contributing support to the hypothesis of 
the unidimensionality of language deficit in 
aphasia. 

For those unfamiliar with scale analysis, 
the argument proceeds simply as follows: 
(a) Assume that the language deficit in 
aphasia is a homogeneous deficit varying 
in amount from patient to patient. (b) 
Assume further that performance on most 
simple tests involving language (whether 
they are tests of comprehension, reading, 
writing, or speaking) depends largely on 
the amount of general deficit present. (c) 
Finally, assume that these tests involve 
different amounts of this language com- 
petence; that is, they are spaced along a 
language continuum, in order of difficulty. 

If all these assumptions hold, patients 
with extreme deficit will pass only the 
easiest test, or none at all. Patients with 
less extreme deficit will pass all the easiest 
tests, and some of the next most easy ones. 
Patients with moderate deficit will pass all 
of the easy tests and some of the more 
difficult ones, and so on. In general, as 
less deficit is found, patients will pass all 
the tests passed by patients with more 
deficit, and then the next easiest tests. 

If the tests scale, it becomes possible to 
give the patient a single numerical score 
indicating the degree of severity of the 
language deficit, from which one can tell 
which tests he passed and which he failed. 
Conversely, if the difficulty level of a test 
is known, one can tell which patients 
passed and which patients failed it. These 
are precisely the conditions for high re- 
producibility in the Guttman seale analysis. 
The reproducibility index represents the 
percentage of successful predictions for the 
entire sample of subjects for all the tests 
scaled, when test difficulty and severity of im- 
pairment are known. If the test-subject matrix 
is found to be scalable (that is, if it is about 
90 per cent reproducible), this is compatible 
with the hypothesis of a single dimension of 
language deficit. Ordinarily, when scal- 
ability is found, it is assumed that a unidi- 
mensional continuum exists. If the matrix 
is not scalable, then the hypothesis of a 
single measureable dimension is not tenable 
and other models or descriptions must be 
sought. 

It is readily apparent that if there are 
frequent occurrences of different aphasic 
types, the matrix will not prove to be 
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scalable. A “sensory aphasia” and a “motor 
aphasia” might be at the same severity 
level, but presumably would pass and fail 
quite different tests. This would result in 
a nonreproducible matrix, reflecting diver- 
gent factors influencing test performance. 

Over all, then, it is clear that, if the tests 
(a) include content which would be appro- 
priate to show types if they existed and (b) 
vary in difficulty, and a general sample of 
aphasic patients is used, and the resulting 
patient-test matrix is scalable, this may be 
considered to support the hypothesis that 
the language deficit in aphasia is unidimen- 
sional and the necessity for topological 
systems of classification becomes question- 
able, to say the least. 


Subjects 

Subjects were 123 patients admitted to the 
Aphasia Division of the Neurology Service 
of the Minneapolis VA Hospital between 
June 1954 and April 1957. These were con- 
secutive admissions, except that patients with 
no aphasia, patients without persisting apha- 
sia, patients seen too soon after the onset of 
aphasia for physiological conditions to have 
stabilized, patients who could respond to 
no test items, and psychotic patients were 
excluded. Patients were not judged 
psychotic unless clearly aberrant behavior 
was present, such as outbursts of unprovoked 
rage, delusional states, or regressive behav- 
ior; in every such case the diagnosis was 
confirmed by psychiatric evaluation. 

Age range was from 19 to 76; slightly 
more than half the patients were over 50. 
Etiology included cerebrovascular accidents, 
tumors, arteriosclerosis, and cerebral enceph- 
alopathy resulting from trauma and other 
causes. Slightly more than half the patients 
had incurred cerebrovascular accidents. 

The records of one hundred aphasics who 
had been tested on Form 4 and Form 6 of 
the diagnostic battery were drawn from the 
files for study. Twenty-three patients who 
had been tested on Form 5 were held out 
as an independent sample which could be 
studied later for cross-validation purposes. 


Tests 

The tests employed in this analysis were 
taken from Form 6 of the Minnesota Test 
for Differential Diagnosis of Aphasia, by 
Schuell (1955b). The tests were given 
individually as part of the diagnostic testing 
program, on entrance to the clinic. The 
battery is given in several sessions to avoid 
fatigue. The testing time for the complete 
battery is approximately three hours. 


Of an original battery of 62 tests investi- 
gating aphasic disabilities, 13 tests exploring 
disturbances of numerical concepts, arith- 
metic processes, and body schema were 
discarded as peripheral to the central 
problem being investigated. Six tests could 
not be used because they were not common 
to the three forms of the test from which 
data were obtained. In addition, (a) audio- 
metric findings, (b) gross tests for visual 
perception, such as pointing to crosses 
arranged in various positions on a page, 
matching colors, forms, pictures, and 
symbols, (c) imitating gross movements of 
speech musculatures, and (d) copying and 
drawing tasks were eliminated because they 
were not considered primarily language 
tests. It should be pointed out that eliminat- 
ing gross perceptual and motor tests ,could 
not screen out the effects of such disturbances 
on language performances, however. 

The 29 tests which remained included 
tests of auditory comprehension, reading, 
writing, and speech at various levels of 
difficulty. They employed auditory, visual, 
and combined presentation of stimuli, and 
required various response modes, such 
as speaking, writing, and gross motor 
responses. 

One control test (rapid alternating move- 
ments) was included which was considered 
non-language, although it was still a test of 
speech function. It was hoped that this 
test would demonstrate the ability of the 
scaling procedure to reject irrelevant tests 
and provide a control against building a 
hierarchy which reflected general extent of 
brain damage, rather than language deficit. 

A list of the tests, with brief descriptions, 
follows. (Items followed by asterisks were 
included in best 18 tests selected in the first 
analysis. ) 


AUDITORY 


1. Auditory recognition.* A group of 
objects, then groups of pictures are displayed 
and S is asked to point to object or picture 
named by the examiner. Visual difficulties 
are first screened by a matching task. 18 
items. 

2. Recognition of symbols.* A series of 
cards, each containing five or six letters or 
numbers, are presented individually, and the 
patient is required to point to the symbol 
named by the examiner. 26 items. 

3. Repetition of sentences. Patient is 
required to repeat sentences equated for 
difficulty, but of progressive length. 10 
items. 
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4. Repetition of digits. Tests from the 
Terman-Merill revision of the Stanford 
Binet Form L are used, to 10th year level. 
4 items. 

5. Directions.* The patient is asked to 
follow directions of progressive length. 
Examples: “Ring the bell. Put the spoon 
in the cup. Point to the comb, pencil, and 
the key.” 10 items. 

6. Detecting errors.* Statements are 
read to the patient, who is asked to indicate 
whether each statement is right or wrong. 
Example: “You eat with a knife, fork, and 
comb. Is that right?” 6 items. 

7. Short paragraph. A short narrative 
paragraph is read to the patient, and he is 
asked questions about the content, which 
can be answered by yes or no. - 5 items. 

8. Long paragraph.* Same as above, 
except the paragraph contains 13 lines 
instead of four. 6 items. 


READING 


1. Matching word to picture.* Card with 
single printed word is displayed, and patient 
is required to point to appropriate picture. 
The pictures are the same as those used 
for auditory recognition. 12 items. 

2. Sentence comprehension. Patient is 


required to read a series of questions and 


check them yes or no. Examples: “Are 
there seven days in a week? Does everyone 
put money in the bank?” 10 items. 

3. Short paragraph. Patient reads a short 
paragraph, approximating fourth grade 
reading level, and answers questions about 
content by checking yes or no. 4 items. 

4. Long paragraph. Same as above, 
except that material is easy adult level and 
longer. 5 items. 


SPEECH AND LANGUAGE 


1. Rapid alternating movements. Patient 
is asked to imitate syllables ma, la, ka, and 
kala, then repeat each one as rapidly as he 
can. Time limits are generous. 4 items. 
(This was included as a “non-language” 
test.) 

2. Repetition of words.* Patient is re- 
quired to repeat 30 common monosyllabic 
words after the examiner. All consonants 
and consonant blends are used, and any 
mispronunciation is scored as an _ error. 
30 items. 

3. Sentence completion.* Simple sentences 
containing common associations are read to 
the patient, who is asked to supply the last 
word. Example: “Please pass the salt and 

” Any appropriate word is scored 


correct. 8 items. 


‘leather. 


4. Serial responses.* The patient is asked 
to count to 20, then name the days of the 
week and the months. 30 items. 

5. Simple questions.* The patient is 
asked questions requiring single word 
responses. Examples: “What do you shave 
with? What do you do with money?” 
8 items. 

6. Naming.* Patient is asked to name the 
objects and pictures used for auditory 
recognition. Naming precedes auditory 
recognition in administration of the test. 
18 items. 

7. Rhymes.* Patient is asked to give a 
word which rhymes with go, tree, and car. 
3 items. 

8. Definitions. Patient is asked to explain 
the meaning of the following words: robin, 
island, motor, bargain, courage, repair, and 
Qualitative scoring according to 
criteria established by classifying obtained 
responses. 7-point scale. 

9. Giving information.* Patient is asked 
to supply biographical data in response to 
specific questions. Some questions require 
only a simple response, such as name and 
address, and others require more elaboration, 
as describing his job, or how he occupied 
his time at home. 12 items. 

10. Picture description.* Patient is asked 
to describe a picture, and tell what is 
happening in it. Qualitative score, as on 
Test 8. 7-point scale. 

11. Expressing ideas.* Patient is asked 
to tell three things he has done during the 
day and three things a good citizen should 
do. 6 items. 

12. Similarities.* Patient is asked to tell 
how two things, such as a knife and a fork, 
are alike. Two errors are scored if patient 
can state no similarity, and one for failure 
to place objects in category. 12 items. 

13. Proverbs. The patient is asked to 
explain the meaning of three common 
proverbs. Item is not scored correct unless 
there is appropriate generalization. 3 items. 


WRITING 


1. Letters to dictation.* Patient is re- 
quired to write letters of alphabet dictated in 
random order. 26 items. 

2. Written spelling. Patient is asked to 
write eight words to dictation. Two words 
are dictated from third, fourth, fifth, and 
sixth grade spelling levels. 8 items. 

3. Sentences to dictation.* Patient is 
required to write sentences dictated by 
examiner. Each sentence is dictated as a 
unit, and sentences are progressive in length. 
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No words beyond fifth grade spelling level 
are used. 8 items. 

4. Spontaneous writing. Patient is asked 
to write a paragraph describing a picture and 
telling what is happening in it. Qualitative 
scoring, as on Tests 8 and 10. /7-point 
scale. 


Plan of Analysis 


Each test was treated as an “item” being 
considered for inclusion in a homogeneous 
scale. Although the tests were scored 
quantitatively, both the general orientation 
of the study and the distribution of scores 
on the tests (discussed below) argued for 
treating the data on a simple “pass fail” 
basis. Each patient was given a total score 
equal to the number of tests passed with- 
out error. Each test was given a difficulty 
level determined by the number of patients 
passing it. Phi coefficients were computed 
for each test and for each patient in the 
manner suggested by White and Saltz 
(1957). The tests with the highest phi 
coefficients were used in the scale analysis 
itself and were subjected to further intensive 
study in the various diagnostic-prognostic 
patient groups. Modifications were made in 
the passing and failing levels in those tests 
where it appeared advantageous to increase 
the homogeneity of the test or to obtain a 
better distribution of difficulty levels in the 
battery considered as a whole. Repro- 
ducibility of the modified scoring was com- 
puted. An abbreviated scale made up of 
a small number of best tests was analyzed 
for scalability. A universe of tests suggested 
by clinical experience was separately tested 
for scalability also. Finally, the scales 
developed on the first sample of patients 
were applied to a new sample of patients 
and the scalability again computed. 


RESULTS AND DISCUSSION 


Score Distributions of Individual Tests 


The distribution of scores on each 
test was plotted for the 100-patient 
sample. The majority of the tests 
yielded U-shaped and J-shaped distribu- 
tions. This was interpreted to mean 
that the tests represented relatively 
homogeneous behavior samples. When 
the tests were easy, virtually all 
patients passed or nearly passed them; 
when they were difficult, virtually all 
patients failed them by a wide margin; 


when they were at intermediate levels 
of difficulty, tests tended to split the 
patient population into two groups, 
those passing and those failing, with 
few patients falling at intermediate 
scores. Only a few tests yielded ir- 
regular multimodal distributions. It 
should be noted that the nature of these 
distributions argues against the tradi- 
tional correlational analysis of these 
data and suggests that if there is any 
organization of the tests it will be of 
the hierarchical, scalable kind. 


Phi Coefficients of the Tests 


Phi coefficients were calculated for 
each test in the following manner. The 
difficulty of each test was measured by 
percentage of subjects passing. Sever- 
ity of aphasia was measured for each 
subject by number of tests passed. 

This procedure resulted in a distribu- 
tion of tests in order of difficulty, and 
a distribution of subjects in order of 
severity of aphasic involvement, as 
defined above. 

A fourfold table was prepared for 
each test, showing how many of the 
subjects who passed belonged to the 
group with correspondingly high total 
scores, and how many of the subjects 
who failed belonged to the group with 
correspondingly low total scores. Fig- 
ure 1, representing the distribution for 
a good test, will clarify this. 

Seventy per cent of all subjects 
passed the test represented. Sixty-five 
of the seventy who passed were in the 
upper seventy per cent of the distribu- 
tion of patients on the basis of total 
score; five per cent were not. Thirty 
per cent of all subjects failed the given 
test. Twenty-five of these were in the 
lowest thirty per cent of the distribu- 
tion; five were not. Thus the test 
misclassified ten patients out of a 
hundred. The obtained phi coefficient 
is 762. This isa high phi coefficient, and 
the test would be judged to be measur- 
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Severity of Aphasia 
(Number of Tests Passed) 





Pass Test 7 65 





Fail Test i 25 














30 70 


ye bc — ad ~ 
(a + b)(c +d) 


Illustrative example of the com- 
putation of a phi coefficient. 


4 76 


Fie. 1. 


ing, for the most part, factors common 
to those measured by the majority of 
the other tests in the scale. 
Essentially the phi coefficient repre- 
sents the relationship between the 
distribution for a single test and the 
distribution for the entire scale; a good 


phi coefficient can be interpreted to 


mean that the test is successful in 
measuring whatever is being measured 
in the battery of which it is a part. 
It is not an absolute measure, since it 
is affected by the difficulty level of 
the test under consideration.” 

The tests were then studied with 
respect to phi coefficient, level of 
difficulty, and quality of errors. These 
data are presented in Table 1. The 
tests are arranged in descending order 
of magnitude of the phi coefficients. 
Difficulty level is represented by per 


*White and Saltz may convey the im- 
pression that phi varies from zero to 1.00 
under these conditions. However, the mar- 
ginal totals here are constrained to be equal 
(the “upper” percentage on the total score 
distribution is determined by the number 
of patients passing the item in question). 
Under these conditions phi varies from an 
upper limit of 1.00 to a lower limit which 
is determined by the difficulty level of the 
item. (The lower limit varies from -1.00 
to -.05, for example, as the difficulty varies 
from 50 to 90 per cent.) 


cent passing. “Goodness” of errors 
represents a qualitative judgment of 
the scatter of the errors of misclassifica- 
tion among the total-score groupings. 
For example, if a test is passed by 30 
patients and 25 of the patients are in 
the highest 30 in total score, the errors 
of misclassification may be “good” (the 
remaining 5 passes may be found in the 
next highest ten per cent of the 
patients), “fair” (for example, the five 
passes may be found in the next highest 
20 per cent of the patients), or “poor” 
(the five passes may be found dis- 
tributed far down the total-score 
hierarchy). 

A study of the table reveals generally 
encouraging results. Eight tests yield 
phi coefficients of .70 or higher, seven 
tests have phi coefficients between .60 
and .70; five tests fall between .50 and 
.60, four are between .40 and .50, and 
only five tests fall below .40. Several 
considerations entered into the decision 
concerning the retention of the “best” 
tests for scale analysis. In part this 
selection was determined by the magni- 
tude of the phi coefficients, in part by 
the quality of errors. Since the 
alternating movements test (which was 
not considered primarily a language 
test) had a coefficient of .46, it was 
considered that the lower limit of 
acceptability should be somewhat above 
that point. A distribution of test 
difficulties argued for the inclusion of 
sentences to dictation and similarities 
in order to sample the “difficult” end of 
the continuum adequately. A _ con- 
sideration of the qualitative ratings 
seemed to show that phi coefficients 
below the mid-50’s were in general 
accompanied by poorer errors than 
those above that point. In the light 
of these considerations, the decision 
was made to drop the tests with phi 
coefficients below .57, discarding eleven 
tests and retaining eighteen for further 
analysis. 
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TABLE 1 


DaTA ON 29 TEsTS FOR ORIGINAL SAMPLE OF 100 PATIENTS 








Tests Phi 


Coefficient 


Per Cent 
Misclassified 


Per Cent 
Passing 





.78 
-76 
-76 
-76 
72 
71 
71 
-70 
.69 
68 
.67 
65 
64 
63 
.60 
59 
57 
57 
53 
53 
46 
46 
42 
40 
39 
31 
25 
.23 
.23 


Reading: word to picture 
Speech : giving information 
Speech : picture description 
Speech: sentence completion 
Speech: serial tasks 

Speech: rhyming 

Auditory: recognition 

Speech : response to questions 
Auditory : following directions 
Speech : repetition of words 
Speech : naming 

Auditory: long paragraph 
Writing: letters to dictation 
Speech : expressing ideas 
Auditory: pointing to symbols 
Writing: sentences to dictation 
Speech : explaining similarities 
Auditory: recognizing errors 
Writing: spelling 

Reading: short paragraph ® 
Speech: alternate movements 
Auditory: sentence repetition 
Auditory: digit repetition 
Auditory: short paragraph 
Speech : definitions 

Writing: spontaneous paragraph 
Reading: sentence 

Speech: explaining proverbs 
Reading: long paragraph 





76 8 
47 12 
21 8 
46 12 
50 14 
30 12 
71 12 
38 14 
35 14 
48 16 
40 16 
13 8 
34 16 
25 14 
46 20 
11 8 
17 12 
Ad 24 
15 12 
37 22 
60 26 

8 8 
14 14 
54 30 
9 10 

3 4 
18 22 

7 10 

7 10 














Phi Coefficients for the Subjects 


Phi coefficients were calulated for 
each subject in a manner analogous to 
the calculation for each test. It was 
not our purpose to purify the subject 
sample, since we wished to study the 
whole class of aphasics, but it was 
considered that the clinical diagnostic 
groups should be observed early in the 
analysis to see if they were related 
differentially to whatever it was that 
the tests were measuring. 

These data showed that almost the 
complete range of phi coefficients was 
present in each diagnostic group. There 
appeared to be no evidence that clearly 
separate populations were involved. 


Guttman Analysis 


Total sample. The 100 Ss were 
given new total scores based on the 
eighteen “best” tests. The subjects 
were ranked from high to low; the 
tests ranked from easy to difficult and 
the distribution of individual passes 
and fails was plotted. For each test 
new phi coefficients were computed and 
reproducibility of the whole matrix was 
computed. The phi coefficients are 
reported in Table 2. The coefficient 
of reproducibility for the entire array 
of tests and subjects was found to be 
89.9 per cent. 

Such results for a battery as large 
as this one may certainly be regarded 
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as encouraging. It could be argued 
that results are inflated by inclusion of 
tests at both extremities of the difficulty 
continuum and by inclusion of patients 
completely without speech, ‘or with 
nearly normal speech. However, study 
of difficulty levels of various tests in- 
dicated a wide distribution over the 
entire range of difficulty, and classiflca- 
tion errors were not markedly different 
for tests at extreme levels of difficulty 
and tests in the middle levels, although, 
of course, tests at intermediate levels 
showed more errors. 

Edwards (1957) suggests that the 
coefficient of reproducibility be com- 
pared with its lower limit: the 
percentage of correct classification 
achieved when each item is predicted 
in its most popular direction. The 


reproducibility coefficient obtained rep- 
resents a substantial gain: 89.9 com- 
pared to the minimum of 66.8. 

A study of patients indicated that 


the lower end of the patient distribu- 
tion was dominated by Group I 
patients, who would add to the repro- 
ducibility of any such matrix, because 
they failed a very high proportion of 
the tests. 


TABLE 2 


DaTA ON Best 18 TESTS FOR ORIGINAL 
SAMPLE OF 100 PATIENTS 


Phi Per 
Coeffi- Cent 
cient Passing 


Tests 





Reading: word to picture 
Auditory: recognition 

Speech: serial tasks 

Speech: repetition of words 
Speech: giving information 
Speech: sentence completion 
Auditory: pointing to symbols 
Auditory: recognizing errors 
Speech: naming 

Speech: response to questions 
Auditory: following directions 
Writing: letters to dictation 
Speech: rhyming 

Speech: expressing ideas 
Speech: picture description 
Speech: explaining similarities 
Auditory: long paragraph 
Writing: sentences to dictation 











Note.—Reproducibility: 89.9; minimum reproduci- 
bility: 66.8. 


In order to study these problems 
further, the analysis was repeated on 
the subgroups of patients (using 
classification groups given above), and 
a study was made of the scoring of the 
individual tests and of the possibility 
of arranging their levels of difficulty 
to sample the entire range more 
systematically. 

Subsamples. Group I. Only 6 of 
these patients passed any of the tests 
in the Minnesota battery; 11 failed all 
18 tests. It was considered that this 
subgroup was therefore not appropriate 
for separate analysis and, indeed, might 
have functioned as hypothesized to 
increase the general reproducibility of 
the test-subject matrix. No further 
analysis was performed on this group. 

Only 5 patients in the sample of 
100 were classified as Group IV. 
While the data for these patients were 
consistent with the hierarchy estab- 
lished in the general test-subject 
matrix, the sample was too small for 
any meaningful analysis. Therefore, 
this group was excluded from further 
work at this point. 

Groups IT, III, and V, with 31, 20, 
and 27 patients respectively, were 
chosen for independent scale analysis. 
Group II, as stated above, is the group 
which most resembles “pure” aphasia. 
Group III may be considered to be a 
basic aphasia with an overlay of visual 
difficulties, and Group V may be con- 
sidered as a population with aphasia 
and diffuse visual and motor difficul- 
ties. These groups were analyzed 
separately and combined for a 
new over-all analysis of test-subject 
scalability. 

While the ordering of tests, in terms 
of relative difficulty, varied somewhat 
from group to group, in all three 
patient groups the 18 tests were found 
to be scalable. For Group II the tests 
were 90.7 per cent reproducible; for 
Group III the test scores were 86.9 
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TABLE 3 


RANK ORDER CORRELATION COEFFICIENTS 
BETWEEN SUBGROUPS 








Group III 
(N = 28) 


Group V 


Group II 
(N = 27) 


(N = 31) 





Total Sample 91 .89 .98 
(N = 100) 
Group II 


Group III 


70 .87 


.90 














per cent reproducible; for Group V 
they were 91.6 per cent reproducible. 
When the three groups were combined 
(N=78 cases) the over-all repro- 
ducibility was found to be 87.7 per cent. 

The comparison of difficulty levels of 
specific tests from one clinical group 
to another tends to confirm the original 
differential diagnoses based on the 
pattern of deficit present in each 
patient group. Table 3 presents the 
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rank-order correlation coefficients be- 
tween the orders of difficulty of the 
tests for the entire group, and for each 
of the subgroups considered here. 

It can be seen that the order of 
difficulty of the tests is very similar 
between each group and the total 
sample. In the group-to-group com- 
parisons, Group II and Group III 
appear as the most deviate pair, 
although each correlates highly with 
Group V and the total sample. 

Table 4 presents the tests on which 
rank-orders of the three subgroups 
differed by three or more ranks. There 
are nine tests for which this is true. 
Six of these are speech tests which 
use no visual materials and require no 
writing. On five of these six tests (all 
except the serial tasks, the easiest of 
the speech tests) the differences favored 
Group III in this sample; that is, the 


TABLE 4 


Tests oN WutcH RANK ORDER DIFFERED More THAN THREE RANKS 








Combined 
Groups 
(N = 78) 


Group III 
(N = 20) 


Group V 


Group II 
(N = 27) 


(N = 31) 





| 
Rank | Per Cent 
Order | Passing 
| 


Per Cent 
Passing 


Rank’ 
Order} 


Rank 
Order 


Per Cent 
Passing 


Per Cent 
Passing 


Rank 
Order 





Tests with no visual com- 
ponents 








Serial tasks 
Information 

Sentence completion 
Response to questions 
Expressing ideas 
Similarities 

Mean for six tests 





Tests with visual com- 
ponents 





Pointing to symbols 
Naming pictures 
Letters to dictation 





Mean for three tests 





Mean for nine tests 
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speech tests were easier for the patients 
in Group III than for those in Group 
II or Group V. This finding was 
interpreted to mean that there were 
more mild aphasics in Group III than 
in Group II. This is considered an 
artifact of the population sample, and 
can be explained by a selective process 
operating. Many patients request ad- 
mission or are transferred to the 
Minneapolis VA Hospital primarily 
for treatment of aphasia. It is probable 
that many Group III patients who 
would not have sought hospitalization 
for minimal speech difficulties were 
motivated to do so by the more serious 
trouble they had reading and writing. 
Although Group III in this sample 
contained a few patients with severe 
or moderately severe aphasia, it also 
contained more very mild cases than 
did Group II. 

However, the remaining three tests, 
which showed significant differences 


in order of difficulty, were tests which 


used visual materials or required 
written responses, and on each of these 
Group II was superior to Groups III 
and V. The implications of this seem 
clear. On each of these three tests 
there was a visual component which 
depressed the scores of both the 
patients in Group III and those in 
Group V. 

Naming, the first of these tests, uses 
pictures. Superficially it appears 
strange that matching words to pictures 
and auditory recognition did not show 
significant differences in order of 
difficulty for the subgroups. The 
answer probably lies in the nature of 
the three tasks for aphasic patients. 

In the first place, although the order 
of difficulty for these two tests was 
the same for all subgroups (they are 
the easiest tests in the battery), they 
were passed by a larger percentage 
of patients in Group II than those in 
Groups III or V, although Group ITI 


patients did better than those in Group 
II on most language tests. 

Matching words to pictures and 
auditory recognition are both essen- 
tially matching tasks: matching the 
printed word and spoken word to a 
picture. Previous studies have shown 
that only the most severely impaired 
aphasic patients had difficulty with the 
matching tasks on the battery (Schuell, 
1955b). Group III patients with mild 
and even moderately severe impairment 
match most words to pictures at this 
level. Errors tend to appear when 
patients are asked to discriminate 
words which look a good deal alike, 
such as house-horse, stone-store or 
match-watch, but such tasks do not 
appear on this test. 

Naming is a more difficult task. 
Neither the spoken nor the printed 
word is present to give added clues to 
the picture, if it is not perceived 
clearly, or is perceived in a distorted 
fashion. The number of clues is re- 
stricted, and for Group III patients 
this increased the difficulty of the task 
markedly. 

Pointing to symbols and writing 
letters to dictation require recognition 
and recall of symbol forms. Recogni- 
tion of symbols is always more impaired 
for aphasic patients than recognition of 
objects. Both Group II and Group III 
patients frequently confuse letters 
whose names sound alike, such as 
bedegtpz, hajk, iy, and uq, and letters 
closely associated in serial learning. 
In addition, Group III patients confuse 
letters with similar configurations, such 
as bdpq, hnru, WM, EF, JL, and PBR 
(Schuell, 1954). These errors are 
strikingly consistent. 

They appear day after day in all 
samples of a patient’s writing, are 
mirrored in reading, and the same 
confusions appear for patient after 
patient. 

Severely impaired Group III patients 
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can recognize or recall no symbol 
forms. They are retaught lower case 
symbols first to aid reading. Different 
errors appear when script is introduced. 
Then el, gy, bf, hk, ao, and so on are 
confused. Over all, it seems apparent 
that these three tests constitute visual 
tasks for patients who have impaired 
visual processes, and present more 
difficulty to these patients than to 
patients who do not have specific 
visual involvement. 

Group V patients, who have audi- 
tory, visual, and motor deficits, fall 
below the mean number of patients 
passing all tests, but they are more 
like Group III than Group II patients 
on over-all test patterns, which would 
be predicted, since they all have some 
visual involvement, and Group II 
patients do not. 

The present analysis may therefore 
be considered to add further evidence 
substantiating the clinical patterns 
which have previously been identified, 
and at the same time increasing the 
significance of the hierarchy obtained, 
by indicating that it is obtained in spite 
of the fact that there are visual com- 
ponents on some of the tests in the 
battery. 

Modified scoring. In order to im- 
prove the distribution of test errors 
(unpredicted passes and fails) and 
exploit the power implicit in the U 
distributions common in these tests, it 
was decided to experiment with modi- 
fied scoring. In the previous analysis, 
tests were scored on the basis of a 
perfect score versus any error. In 
modified scoring the cutting line was 
moved one unit at a time, first accepting 
0 and 1 error as passing, then 0, 1, 
and 2 errors as passing, then 0, 1, 2, 
and 3 errors as passing, to maximize 
the phi coefficients between the tests 
and the total battery. Where this 
resulted in marked improvement of the 
phi coefficient, or resulted in an 


equivalent coefficient, but obtained a 
more balanced distribution of difficulty 
of the tests, the modified scoring was 
adopted. The following six subtests 
were so modified : 


1. Reading: matching word to picture; 
0, 1, and 2 errors scored passed. 
2. Speech: repetition; 0, 1, 
errors scored passed. 

3. Speech: giving information; 0, 1, 2, 
and 3 errors scored passed. 

4. Auditory: pointing to named symbols; 
0, 1, 2, and 3 errors scored passed. 

5. Auditory: recognizing errors; 
and 2 errors scored passed. 

6. Auditory: following directions; 0, 1, 2, 
and 3 errors scored passed. 


2, and 3 


0, 1, 


On three of these tests (1, 2, and 
4), the improved scalability obtained 
from modified scoring probably resulted 
from screening out mild motor and 
visual difficulties, smoothing the curve 
for the total population, but making 
the tests less discriminating for super- 
imposed visual and motor deficits. On 
Tests 3 and 5, analysis revealed 
errors piling up on specific items of 
each test; the improvement obtained 
by modification undoubtedly resulted 
from screening out the effects of the 
poorest items. This was not true of 
Test 6, however. Items on this test 
are progressive in length, and more 
patients failed the longer than the 
shorter items. It appears probable that 
modified scoring passed patients with 
mild aphasia who make only occasional 
errors. If this is the correct explana- 
tion, the diagnostic value of the test 
would be lessened, not increased, by 
modifying the scoring in this way, in 
spite of the fact that it makes the test 
more scalable. There is inevitably more 
scatter for mild aphasics because a 
much larger sample of language behav- 
ior must be observed for errors to 
appear than is necessary for moderate 
or severe aphasics. However, the 
errors for these patients are character- 
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TABLE 5 


DATA FOR TESTS WITH MODIFIED SCORING 
Usep WHERE APPROPRIATE 








Phi 
Coeffi- 
cient 


Tests 





Reading: word to picture 85" 
Auditory: recognizing errors 
Auditory: recognition 
Auditory: following directions 
Auditory: pointing to symbois 
Speech: giving information 
Speech: repetition of words 
Speech: serial tasks 

Speech: sentence completion 
Speech: naming 

Speech: response to questions 
Writing: letters to dictation 
Speech: rhy ming 

Speech: expressing ideas 
Speech: picture description 
Speech: explaining similarities 
Auditory: long paragraph 
Writing: sentences to dictation 











Note.—Reproslucibility : minimum reproduci- 
bility : 60.6. 


* Scoring modified. 


92.9; 


istic ones, and obtained patterns are 
consistent. 

The resulting distribution of test 
difficulties and phi coefficients is given 
in Table 5. 

The modified scoring resulted in 
increases in the reproducibility of the 
tests for all groups of subjects. Re- 
producibility on the total sample 
(N=100) was 92.9 per cent. Repro- 
ducibility on Group II (N=31) was 
93.4 per cent. Reproducibility on Group 
III (N=20) was 92.2 per cent. Repro- 
ducibility on Group V (N=27) was 
93.4 per cent. Reproducibility on 


TABLE 6 


Data FOR TESTS IN THE ABBREVIATED SCALE 


Coeffi- 
cient 


A ent 
Passing | 





Reading: word to picture 84 
Auditory: following directions 
Auditory: pointing to symbols j 
Speech: giving information , 62 
Speech: serial responses 
Speech: sentence completion 
Speech: response to questions 
Speech: picture description 
Auditory: long paragraph 


Phi "tie Per 
| 
| 








Note. > ‘arcs 96.3; minimum reproduci- 
bay 
Micctisea scoring. 


Groups II, III, and V together was 
91.8 per cent. 

Abbreviated scale. It was thought 
that it would be useful to set up a 
short scale of very high reproducibility 
as an index of over-all language deficit. 
For this purpose nine tests were 
selected, using modified scoring where 
appropriate. These tests were selected 
for spacing along the difficulty con- 
tinuum and for high phi coefficients. 
The application of the abbreviated 
scale to the total sample of 100 yielded 
a reproducibility of 96.3 per cent. The 
tests used were as follows: 


. Reading: matching word to picture. 
. Speech: serial responses. 
. Speech: giving information. 
. Speech: sentence completion. 
. Auditory: pointing to symbol named. 
. Speech: answering simple questions 
(one word responses). 

7. Auditory: following directions. 

8. Speech: picture description. 

9. Auditory: long paragraph. 


Difficulty levels and phi coefficients 
are shown in Table 6. 

Clinical scale. Prior to this study 
the senior author had predicted a 
hierarchy of speech test functions 
based on clinical experience with 
patients recovering from aphasia. This 
hierarchy was composed of 7 tests 
thought to be sampling almost pure 
language functions. It appeared that 
if this reflected order of recovery in 
aphasia, and if the language deficit 


TABLE 7 
DATA FOR TESTS IN THE CLINICAL SCALE 





Phi Per 
Coeffi- 





Auditory: recognition 
Speech: repetition 

Speech: sentence completion 
Speech: naming 

Speech: response to questions 
Speech: expressing ideas 
Speech: similarities 











Note.—Reproducibility: 95.3; minimum reproduci- 
bility: 66.0. 
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represented a single dimension descrip- 
tive both of states of patients and 
progress of a single patient, then these 
tests should prove to be highly repro- 
ducible in that order over a cross 
sample of aphasic patients. The fol- 
lowing predicted hierarchy was tested 
for reproducibility : 


. Auditory: pointing to object named. 

. Speech: repetition. 

. Speech: sentence completion. 

. Speech: answering questions. 

. Speech: naming. 

. Speech: expressing ideas. 

. Speech: explaining similarities. 

It was found that this battery was 
95.3 per cent reproducible on the total 
sample of 100 aphasics. The difficulty 
levels and phi coefficients are given in 
Table 7. With respect to the predicted 
order of recovery, here represented as 
the order of difficulty, only one error 
in prediction was made. LKesponse to 
simple questions and naming, which 
are very close in observed difficulty, 
were reversed in the prediction. This 
certainly represents a minimal pre- 
dictive error. (See also the discussion 
of difficulty below.) 

It should be noted that the clinical 
scale and the abbreviated scale, both 
of which have excellent reproducibility, 
overlap on only two tests (Speech: 
Completion, and Speech: Questions) 
and therefore derive their high repro- 
ducibilities relatively independently. 


TABLE 8 
SUMMARY OF REPRODUCIBILITY 








Minimum 


Reproduci- Reproduci- 


bility 





89.9 
92.9 
93.4 
92.2 
93.4 


Best 18 tests 

Best 18 tests, scoring modified 

Group II, scoring modified 

Group III, scoring modified 

Group V, scoring modified 

Groups IT, III and V 
combined 

Abbreviated scale 

Clinical scale 

New sample 

New sample, scoring modified 


91.8 
96.3 
95.3 
91.8 
93.7 











Cross-validation 


As was pointed out earlier, twenty- 
three cases were held out of the analysis 
for use in cross-validation. When one 
starts with a large number of tests, 
discards subtests, modifies the scoring, 
and then builds special scales, it is 
always possible that chance factors in 
the data will be utilized to gain spuri- 
ous scalability. (This criticism, of 
course, could not be leveled at the 
clinical hierarchy which was predicted 
in advance of the analysis). To guard 
against this possibility and to estimate 
the instability of the scaling, the 
various test batteries and modifications 
were applied to a new sample of 
patients. 

The sample consisted of twenty-three 
patients admitted in 1954-1955. These 
patients had been assigned to the 
following categories: Group I, N=1; 
Group II, N=6; Group III, N=5; 
Group IV, N=4; Group V, N=7. 


Twelve patients were over 50 years of 


age. Eighteen had incurred cere- 
brovascular accidents, two were tumor 
cases, and three were cases of 
encephalopathy. 

The results indicated a high degree 
of confirmation of the earlier findings. 
Using pass-fail scoring on all eighteen 
tests, the test-subject matrix was 91.8 
per cent reproducible. With “modified 
scoring” on the eighteen tests (modi- 
fied exactly as before) the results were 
93.7 per cent reproducible. On the 
abbreviated scale 95.6 per cent repro- 
ducibility was found. On the clinical 
scale 94 per cent reproducibility was 
obtained. It appears dramatically 
clear that all of these sets of tests have 
high reproducibility and that repro- 
ducibility was retained on a new 
sample of patients. 

The reproducibility data for the 
various scales and groups are summa- 
rized for convenience in Table 8. 
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Comments Concerning the Methods 
Employed 


The use of the phi coefficient to select 
tests for consideration in the scale 
analysis has been questioned on the 
basis that this procedure automatically 
rejects all but one or two variables, 
and may therefore have excluded 
information related to rarer forms of 
aphasia, and, secondly, that this pro- 
cedure “forces” the data to conform 
to a Guttman scale, making the 
subsequent analysis merely the elabora- 
tion of a tautology. 

The writers concede the first possi- 
bility. If there were a few tests which 
operated to detect a deviant form of 
aphasia but not the general dimension 
dealt with here, it is probable that these 
would be discarded at the early stages 
of analysis, and their deviation from 
the scalable pattern would not become 
obvious. It is similarly true that if 
only a few patients exhibited a deviant 
form of language deficit this would 
not affect the over-all reproducibility 
sufficiently to render the general dimen- 
sion unscalable. Finally, it is true that 
the original battery of tests may be 
insensitive to special forms of aphasia, 
or even have neglected some crucial 
area entirely. The reader is invited to 
consider the original listing and 
description of the tests to decide the 
likelihood of these alternatives for 
himself. The writers believe that the 
rejected tests were simply poorer 
measuring instruments of language 
deficit, either through the unreliability 
of the test (e.g., tests susceptible to 
guessing or accidental success), or 
through the overriding influence of 
other factors (e.g., high visual or 
motor requirements ), or that they were 
of a high enough level of difficulty to 
be relatively unstable with the aphasic 
population. In any comprehensive 
battery of language tests it is impossible 


to avoid some overlap with educational 
and intellectual levels, since some 
illiterates and some mental defectives 
appear in aphasic as in_ other 
populations. 

With reference to the question of 
whether the phi coefficient analysis 
“forces” a successful scale, the answer 
is unequivocally no. This may be 
demonstrated by considering almost 
any classroom achievement _ test. 
Ordinarily in such a test many items 
will be found with coefficients of the 
same magnitude as those used in this 
study, but typically such items will 
not constitute an adequate Guttman 
scale. In the field of attitude measure- 
ment such results have often been 
found. Clark and Kriedt (1948), work- 
ing with an established attitude scale 
composed of items of high internal 
validity (determined by an index 
similar to the phi coefficient used here), 
illustrated in detail differences which 
may exist between scales whose items 
were selected by indices similar to the 
phi coefficient and scales whose items 
were directly selected to maximize 
reproducibility. Briefly, it may be 
said that phi coefficients may be large 
for a variety of reasons. If the total 
battery of tests under consideration is 
complex, any test may relate highly to 
the battery because it is complex in the 
same fashion as the battery, or because 
it samples one important part of the 
complex very heavily, or because it 
samples several parts of the complex 
thoroughly, etc. Nothing “forces” 
such tests to scale. It is perhaps en- 
lightening in this connection to consider 
the amount of variance “accounted 
for” by even what appears to be a 
“high” phi coefficient. The phi coef- 
ficient was used here to select tests 
with which it would be most profitable 
to work if the universe were scalable. 
It did not guarantee reproducibility in 
any sense except the negative one that 
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if most of the phi coefficient had been 
low, it would have been obvious that 
scalability could not be achieved. 

The question of factorial purity of 
the dimension discussed here may well 
be raised. It is conceivable that the 
dimension may be factorially complex 
when viewed in the context of many 
other tests. At the present time the 
authors are of the opinion that this 
question can only be decided by further 
empirical studies. The work of Gage 
on opinion poll data (1947) indicates 
that at least under some circumstances, 
the Guttman technique may isolate the 
items in a scale which are identical 
with those items appearing on the 
major factor in a factor analysis. 
Whether this is true of the present 
findings is a point to be resolved by 
future work. The fact that a dimen- 
sion of language deficit can be isolated 
and scaled across a wide variety of 
behaviors in a broad aphasic sample 
suggests that it may have considerable 
pragmatic importance regardless of 
its factorial composition. 


Implication for Future Research 


This study has been concerned with 
what Loevinger (1957) has called 
the substantive component and the 
structural component of a battery of 
tests of language deficit. We have 
argued that the content is that of 
general language behavior, and the 
structure found is compatible with the 
hypothesis of unidimensionality. Future 
work seems to fall in these categories: 

First, the content should be subjected 
to further careful scrutiny. If dimen- 
sions of language behavior have been 
overlooked or eliminated from this 
analysis, they should be incorporated 
in future research, along with tests 
already known to scale. 

Secondly, different and more sensi- 
tive techniques should be employed 
for the analysis of structure of this 


battery. The Guttman technique has 
been criticized as crude, especially 
when applied in a manner similar to 
that used in this study (specifically, 
permitting the elimination of tests). 
Further and more detailed study of 
the test-to-test relationships present in 
the battery seems to be indicated, and 
factor analysis appears to be one 
fruitful method of proceeding. 

Finally, and most important, what 
Loevinger calls the external component 
of the battery must be considered. We 
see as immediate problems the relation 
of degree of language deficit to rate of 
recovery from aphasia, to observations 
of free or spontaneous speech of apha- 
sics, and to etiology, locus, and extent 
of brain damage. We do not expect 
that simple relationships will be found, 
particularly with respect to the last 
relationships, but see these as areas 
of investigation of primary importance. 


Interpretation 


It appears to the writers that the 
obtained results constitute an im- 
pressive argument for the consideration 
of the language deficit in aphasia as a 
unidimensional trait. The eighteen 
tests constituting the first scale include 
heterogeneous modes of responding. 
Stimuli are presented through visual 
channels (written words, pictures, 
objects, and printed symbols), auditory 
channels (single spoken words, names, 
directions, questions, dictation, and 
stories), and by “general directions” 
involving gesture, examples, and the 
like. Responses are similarly widely 
scattered through gross motor areas 
(pointing, nodding, matching, moving 
objects), fine motor areas (writ- 
ing, oral repetition), and “high 
level” speech functions (identifying 
errors, completing sentences, answering 
questions, and describing pictures). 
When such diverse materials prove 
to be scalable, there seems to be little 
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need for classifications which require 
separate groupings and imply inde- 
pendence of individual language func- 
tions. It is instructive to attempt 
to find in these data examples of 
“pure word deafness,” “pure word 
blindness,” “nominal aphasia,” “expres- 
sive aphasia,” “receptive aphasia,” 
and the like existing independently of 
general language deficit. Such cases 
must either be rare or nonexistent. 
The evidence reported here places the 
burden of proof on the investigator 
who believes that such concepts lend 
any precision to the description of 
language impairment. 

A word of caution about the inter- 
pretation of the present findings is in 
order. It is tempting to look at the 
order of difficulty of tests and set up 
a hierarchy of language functions 
from “elementary” to “complex,” 
assuming that matching is the simplest 
process, auditory recognition the next, 
and so on, up to stating similarities, 


comprehending a long paragraph, and 


writing sentences to dictation. Such 
a functional hierarchy cannot be justi- 
fied from these data. The hierarchy 
is, so to speak, in the language level 
of the patients and not in_ the 
“functions” represented in the tests. 
The level of difficulty of each test is 
the result of (a) the degree of the 
deficit in the patients, (>) the “raw 
difficulty” of the material, (c) the 
nature of the task, and (d) the scoring 
process. Consider the tests involving 
matching words to pictures, pointing 
to the picture named by the examiner, 
and naming pictures. In these tests 
the words are all selected from the 
first thousand most frequently used 
words in the Thorndike-Lorge lists 
(1944; Schuell, 1954). The tests 
could obviously be made more difficult 
by using less common words (e.g., 
“Point to the ibex” instead of “Point 
to the dog’). The tests could be made 


less difficult by more lenient scoring, 
or by avoiding common associations 
such as chair-table, girl-boy, and horse- 
dog, which aphasic patients are par- 
ticularly likely to confuse. At the 
other extreme, similarities could be 
made much more difficult (as such 
items are in intelligence tests) or more 
simple. Selected levels of the tests used 
in the Minnesota battery were the 
results of clinical experimentation and 
analysis of responses, carried on over 
a period of almost 10 years. An effort 
was made to steer a middle course, 
discarding tests which were highly 
influenced by intelligence and educa- 
tional levels, yet maintaining a battery 
which was sensitive enough to permit 
even mild aphasic disabilities to appear. 
The success of “modified scoring” in 
shifting the difficulty level of tests while 
maintaining high scalability indicates 
that many tests function effectively over 
several levels of the language hierarchy, 
rather than marking a particular level 
by a task of one particular nature. 

It seems reasonable to suppose that 
each particular task has some lower 
boundary of ease and simplicity, but it 
seems unlikely that a particular upper 
boundary exists. In any event this 
study was not designed to furnish 
specific information on the relation 
of the nature of language tasks to 
specific levels in the language hier- 
archy, and does so only in the most 
general manner. Such relationships 
(if they exist) must be sought in 
future research. 

An additional caveat should perhaps 
be stated. The reader should not 
assume that the tests in any of the 
scales given above are sufficient to 
form a test battery for diagnosis and 
prognosis in aphasia. It obviously ts 
important to have tests with very high 
visual requirements, to test for motor 
malfunctions and hearing loss, and to 
investigate the patients’ functioning 
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on tests of mixed abilities which are 
important in everyday life (for ex- 
ample, spatial orientation and mathe- 
matical competence ). 

It should be re-emphasized here that 
the classification system described 
earlier in the paper is actually based 
on the presence or absence of specific 
motor or perceptual deficits which may 
or may not co-exist with the basic 
language deficit found in all the clinical 
groups described, and that while 
prognosis appears to be related to the 
over-all pattern of injury incurred in 
the brain, the recovery of language, 
when it occurs, progresses systemati- 
cally and relatively independently of 
the superimposed defects, as indicated 
by the congruence of clinically observed 
recovery patterns and ordering of tests 
obtained on the clinical scale. 

What is asserted is that the tests 
are sufficient to appraise the “pure” 
language deficit present in aphasia, 
and that this outcome is more com- 
patible with the theory of a single 
dimension of language deficit than with 
the multiple dimensions or topologies 
suggested in the past. 


SUMMARY 


For more than 80 years aphasia has 
been described in terms of types of 
speech or comprehension disorders. 
Such types have frequently been 
ascribed to sites of localized brain 
damage. The more recent work has 
found less and less reason to maintain 
such topological schemes. The present 
paper attempts to substantiate the 
hypothesis that there is a single dimen- 
sion of language deficit which may be 
identified in all aphasic patients, and 
which is relatively independent of gross 
motor or perceptual deficits which may 
or may not be present also. 

The records of 100 aphasia patients 
tested on Forms 4 and 6 of the 
Minnesota Test for Differential Diag- 


nosis of Aphasia were examined, while 
those of 23 patients tested on Form 5 
were withheld and analyzed indepen- 
dently for cross-validation. Guttman’s 
scale analysis was used to test for a 
single hierarchy or homogeneous uni- 
verse of language deficit. 

Eighteen tests selected on the basis 
of high phi coefficients, goodness of 
errors, and range of difficulty were 
scaled on a simple pass-fail basis, and 
then with modified scoring. Tests 
were scaled independently for sub- 
groups of patients. Finally, two short 
scales were constructed, the first con- 
sisting of an abbreviated scale of nine 
tests selected for high phi coefficients 
and wide range of difficulty; the 
second, a clinical scale of seven tests 
considered to represent an orderly 
progression in recovery. 

For all groups and test batteries the 
data yielded high coefficients of repro- 
ducibility which were markedly higher 
than the minimum reproducibility in- 
dices. Cross-validation gave strong 
confirmation of the results on the 
original sample. 

The evidence is compatible with the 
hypothesis of a single dimension of 
language deficit present in all aphasia. 
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In 1938 Skinner suggested in effect 
that an initially neutral stimulus be- 
comes a secondary reinforcer when it 
becomes a discriminative stimulus or 
cue. That is to say, after an S has 
been trained to make some particular 
response following the onset of a 
stimulus (illustrating a cue effect), 
that stimulus will operate to 
strengthen anoiiier response which 
precedes its onset (illustrating a rein- 
forcing effect). In the intervening 20 
years we have seen almost no further 
progress in the development of a 
theory of secondary reinforcement. 
Schoenfeld, Antonitis, and  Bersh 
(1950) made a somewhat stronger 
statement in hypothesizing that estab- 
lishing cue properties is a necessary 
and sufficient condition for secondary 
reinforcement. Similar statements 
have appeared in various other places. 
For example, in 1943 Hull stated: 


Stimuli waich acquire secondary reinforcing 
power seem always to acquire at the same time 
a conditioned tendency to evoke an associated 
reaction. ...It is probable... that a 
stimulus gradually acquires its powers of 
secondary reinforcement as it acquires its 
power of evoking the reaction conditioned to 
it (Hull, 1943, p. 97). 


(Since then Hull [1951] has shifted 
to an interpretation in terms of 
anticipatory drive reduction. ) 

At first look it seems curious that so 
little progress has been made toward 
a quantitative theory which will in- 
clude secondary reinforcement. We 


1The work presented here was greatly 
facilitated by a fellowship from the Center for 
Advanced Study in the Behavioral Sciences, 
1955-56. 


note, for example, that a quantatative 
statement is conspicuously absent 
from Hull’s extensive formulations 
(1943, 1951). This gap in our the- 
orizing may become more understand- 
able as we examine the inventory of 
available methods and data. 

We will now turn to a brief exami- 
nation of two general sources of infor- 
mation regarding the secondary rein- 
forcement process, which we will term 
“direct” and ‘‘indirect’’ evidence. 
Our purpose at this point will be to 
point up the main factors which have 
probably served as deterrents to 
theory development. In the subse- 


quent discussion we will raise a slightly 


different question regarding the ex- 
tent to which these factors reflect 
real obstacles, and will proceed to 
explore one path which remains open. 


INDIRECT EVIDENCE 


On one hand, there is a considerable 
amount of indirect evidence on the 
process of secondary reinforcement 
which comes from theoretical inter- 
pretations of more complex experi- 
ments. The interpretation of experi- 
ments on delay of reinforcement 
(Spence, 1947) and chaining (Ferster 
& Skinner, 1957) provide particularly 
good examples. We will also wish to 
consider observing response experi- 
ments in this connection (Kelleher, 
1958; Prokasy, 1956; Wyckoff: 1951, 
1952). The findings in these experi- 
ments yield to a highly parsimonious 
interpretation if we assume a second- 
ary reinforcing process. If such inter- 
pretations are accepted, secondary 





TOWARD A QUANTITATIVE THEORY OF SECONDARY REINFORCEMENT 69 


reinforcement becomes a factor which 
exercises considerable control over 
behavior according to a consistent 
and lawful scheme. We are inclined 
to think of these interpretations as 
‘uses of’ rather than “support for” 
or “‘clarification of” the process of sec- 
ondary reinforcement. This distinc- 
tion disappears in the last analysis, 
since usefulness must be the ultimate 
measure of value of a concept. How- 
ever, there is a reluctance to base a 
quantitative theory on this kind of 
indirect evidence. One would prefer 
to start with the purest and most 
direct evidence. 


DirEcT EVIDENCE 


On the other hand, experiments 
aimed at obtaining ‘‘pure’’ secondary 
reinforcing effects have yielded results 
which are somewhat intractable, first 
because of the apparent feebleness 
of the obtained effects, and second be- 
cause of the heavy dependence on 
specific features of the experimental 
situation. In the following para- 
graphs we will focus attention on one 
particularly important source of diffi- 
culty which effects the interpretation 
of a large body of experimental work. 
A more extensive review of recent de- 
velopments, which provides numerous 
illustrations of the type of difficulty 
suggested above, has been presented 
by Meyers (1958). 

The experimental paradigm which 
is conceptually the most straightfor- 
ward consists of first pairing an initi- 
ally neutral stimulus with primary re- 
inforcement and then attempting to 
condition some arbitrary response, 
using this stimulus in place of rein- 
forcement. The use of this paradigm 
in a Skinner box situation has typi- 
cally yielded either no secondary rein- 
forcing effects at all or effects which 
just barely reach statistical signifi- 
cance (Bersch, 1951; Estes, 1949; 


Schoenfeld et al., 1950: Wyckoff, 
Sidowski, & Chambliss, 1958). It is 
probably no accident that this pro- 
cedure has generally been abandoned, 
usually in favor of a paradigm in 
which the subject is first conditioned 
to make some response by the use of 
primary reinforcement paired with 
the initially neutral stimulus. Sec- 
ondary reinforcement is then tested 
by observing retardation of extinction 
when primary reinforcement is dis- 
continued (Bersch, 1951; Bugelski, 
1949; Melching, 1954; Miles, 1956). 
However, certain contaminating fac- 
tors inherent in this latter paradigm 
have become increasingly evident. 

Experiments and related discussion 
by Melching (1954), Bugelski (1956), 
Elam, Tyler, & Bitterman (1954), and 
Wyckoff et al. (1958) have indicated 
strongly that the apparent secondary 
reinforcing effects obtained with this 
paradigm (as well as others) may be 
partially or entirely due to cue effects 
rather than secondary reinforcement. 

In making this distinction we as- 
sume that, by definition, reinforcement 
is an effect on a response which pre- 
cedes the onset, whereas cue effects 
are revealed by behavior which follows 
the onset of a stimulus. This dis- 
tinction and some of its implications 
have been discussed in detail elsewhere 
(Wyckoff et al., 1958). In the experi- 
ments in question, these factors are 
obviously confounded because re- 
sponses and stimulation occur alter- 
nately during the test. However, 
some clues to the relative contribution 
of each can be obtained in various 
ways. 

In Melching’s (1954) experiment a 
buzz followed lever pressing all of the 
time, half of the time, or none of the 
time during acquisition and extinction 
in a Skinner-box situation. A fac- 
torial design was used. The results 
show a close correspondence between 
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resistance to extinction and the simi- 
larity between stimulating conditions 
during acquisition and _ extinction.? 
Melching points out in effect that 
these findings indicate a strong con- 
tamination of secondary reinforcement 
by cue effects. He suggests that this 
contamination could be avoided by the 
use of the paradigm considered earlier, 
in which the secondary reinforcement 
is used to condition a new response 
rather than to maintain a conditioned 
response. Actually, even this meas- 
ure would not insure freedom from 
contaimination by cue effects (Wyck- 
off et al., 1958), although it would 
certainly tend to reduce the effect. 
Incidentally it seems to reduce group 
differences as well. 

Elam, Tyler, and Bitterman (1954) 
devised an experiment in which cue 
and secondary reinforcing effects were 
placed in opposition to each other. 
A neutral runway led to food in a 


black goal box or no food in a white 
goal box at random during initial 


conditioning. (Opposite colors were 
used in counterbalancing groups.) 
Extinction tests were run using 
either all white or all black goal boxes. 
Greater resistance to extinction was 
found with extinction to the white 
(previously nonreinforced) goal box. 
Since any secondary reinforcing effects 
present would be expected to favor 
running to the black boal box, the ob- 
tained opposite result indicates the 
operation of some other factor. The 
authors interpret the results as sup- 
porting a ‘‘discrimination hypothesis” 
which refers to the facilitating effect 


2 The profound effects which may result 
from subtle similarities and differences in 
stimulus conditions between acquisition and 
extinction have been brought to light in the 
context of studies on the influence of inter- 
mittent reinforcement on extinction (Crum, 
Brown, & Bitterman, 1951; Schoenfeld et al., 
1950; Spence, 1947; Tyler, Weinstock, & 
Amsel, 1957). 


on extinction of “any discriminable 
change in the aferent consequences of 
response.”” This hypothesis would 
seem to imply differential cue effects 
associated with empty white or empty 
black goal boxes encountered during 
extinction.’ 

Bugelski (1956, p. 93) takes notice 
of the role of cue effects in secondary 
reinforcement experiments. He elu- 
cidates in particularly clear detail how 
a stimulus previously associated with 
reinforced responses could produce 
increased resistance to extinction 
simply through its action as a cue. 
He observes that a separate concept 
of secondary reinforcement is super- 
fluous in such cases and suggests that 
the term be discarded. 

Wyckoff, Sidowski, and Chambliss 
(1958) attempted to condition lever 
pressing in a Skinner box, using a 
buzzer as a secondary reinforcer, after 
subjects had been given extensive 
training to approach a water dipper in 
response to the buzzer. They intro- 
duced a control procedure which would 
seem to be effective in isolating second- 
ary reinforcing effects. Control group 
subjects received the buzzer when they 
failed to press the lever for an interval. 
Since this procedure eliminates sec- 
ondary reinforcement but not cue 
effects, differences between experi- 
mental and control groups would have 
to be attributed to secondary rein- 
forcement. Unfortunately, no differ- 
ences between groups were obtained. 

Thus we see varied indications that 
cue effects may be quite potent as 
compared to secondary reinforcement 


3 Actually it is difficult to assess the position 
of Elam, Tyler, and Bitterman with regard to 
the operation of events in the goal box as 
stimuli effecting behavior on subsequent 
trials. On one hand, the discrimination of 
their ‘discrimination hypothesis’’ seems to be 
based on such stimuli and associated re- 
sponses, but elsewhere they argue that the 
trial spacing effectively precludes such effects. 
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in a certain group of experiments, 
dramatizing the necessity for certain 
controls. Although the notion that 
secondary reinforcement is entirely 
artifactual is suggested or implied in 
several places, it should be noted that 
none of these findings demand this 
conclusion. Furthermore, there are 
still a number of findings supporting 
the secondary reinforcement concept 
which do not seem to be weakened by 
these considerations. The findings of 
Saltzman’s experiment (1949), in 
which Ss were initially trained in a 
runway and were tested in a T maze, 
Zimmerman’s: (1957) achievement of 
prolonged secondary _ reinforcing 
effects through the use of intermittent 
reinforcement in a Skinner box, and 
the findings in a number of experi- 
ments on “token rewards’ (Wolfe, 
1936) would be difficult to explain 
without some kind of secondary rein- 
forcement process. In general, these 
relatively direct secondary reinforce- 
ment experiments have not provided 
the kind of orderly detailed data which 
is conducive to quantitative theorizing. 

Let us return for a moment to the 
question of why so little progress has 
been made in the development of sec- 
ondary reinforcement theory. Our 
brief review of two general sources of 
information regarding secondary rein- 
forcement yields the following general 
picture. We see a body of indirect 
evidence. which suggests a powerful 
and orderly influence of secondary 
reinforcement. However, there exists 
a natural bias against the use of in- 
direct data as a point of departure. 
Turning to what we have called direct 
evidence, we find a body of data which 
is frustrating to the serious theorist 
to the extent that several have sug- 
gested that the concept may be use- 
less. The lack of progress in the area 
can probably be attributed to the net 
result of this dilemma. 


At this point it is natural to suggest 
that perhaps we have been too hasty 
in rejecting the body of indirect evi- 
dence as a starting point. The re- 
mainder of the present paper presents 
an attempt to make systematic use of 
certain indirect evidence in the devel- 
opment of a quantitative theory of 
secondary reinforcement. 


SPECIFICATION OF A CLASS 
OF ACCEPTABLE THEORIES 


The general strategy of the follow- 
ing development is as follows. We 
will analyze the operation of secondary 
reinforcement in a particular experi- 
ment—namely, an observing response 
experiment—in which two sources of 
secondary reinforcement operate in 
opposition to each other. The condi- 
tions are such that quantitative treat- 
ment becomes necessary before we 
can even make a qualitative state- 
ment regarding the outcome. Con- 
versely, we will see that the qualitative 
result places certain restrictions on 
acceptable quantitative theories. Our 
approach will depart from the conven- 
tional hypothetical-deductive method 
in one important respect. Rather 
than picking a specific theory and at- 
tempting to show its success in meet- 
ing the empirical requirements, we will 
begin with the empirical requirements 
along with a certain amount of formal 
structure and attempt to circumscribe 
a set of acceptable theories. We 
visualize a process of “‘closing in on”’ 
a specific theory. 


FORMAL STRUCTURE 


As a point of departure we return 
to the familiar hypothesis that the 
secondary reinforcing effects of a 
stimulus are somehow related to the 
strength of that stimulus as a cue. 
We make special note of the fact that 
this hypothesis leaves open the ques- 
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tion of the specific functional relation- 
ship between these two variables, and 
hence represents a fairly broad general 
hypothesis. At this point we wish to 
make a more precise statement of the 
hypothesis, still retaining this element 
of generality. To this end we intro- 
duce the following definitions and 
assumptions: 


Definitions. 


S,-R,—A reflex (or habit), any stimu- 
lus response pair. 

h—A unit of time chosen small enough 
so that the probability of more 
than one response occurring 
within the same unit of time is 
negligible. 

p—Strength of a reflex, probability 
of R; in an interval of length h, 
given that S; is present. This 
variable corresponds to the ratio 
x/S of Estes’ statistical learning 
theory (1950), but further treat- 
ment of its nature is not needed 
here. For the present purpose 
p; will also represent the cue 
strength of S;, although addi- 
tional considerations would be 
required if alternative responses 
to S; were involved (see Wyckoff, 
1954). 

r;—Secondary reinforcing value of 
S;. The quantitative definition 
of r; is inherent in the following 
assumptions. 


Assumptions. 


S,-R; results in 


Assumption 1. If 
the onset of a new stimulus 5S;, p; Is 
changed according to: 


Api = O(r; — pi) (1) 


where @ is a constant between 0 and 1. 
The relationship of this assumption to 
more familiar learning functions is 
readily shown by considering the case 
of a constant reinforcement, say, 7. 


j = flpj) 
4 x 
sR, Sj; LR; 


Fic. 1. The stimulus S; has a secondary 
reinforcing effect on the preceding reflex S;—R; 
The reinforcing effect is determined by p;, the 
cue strength of S;, and produces a change in 


Pi. 


In this case the above difference equ- 
ation can be solved to vield the famil- 
iar exponential learning function with 
an asymptote equal to r. 

Assumption 2. O0<r<1. This 
assumption insures that p; (a prob- 
ability) will remain in the open inter- 
val from 0 to 1. 

The general hypothesis of a func- 
tional relationship between cue and 
reinforcing properties of a stimulus 
can now be expressed as: 


r; = f (pi) (2) 


In line with the approach outlined 
above, we will not adopt a specific 
function for f(p;). However, it is 
probably safe to assume the following 
properties. 

Assumption 3. f(p;) is a continu- 
ous and differentiable nondecreasing 
function. 

It is convenient to place primary 
reinforcement into the same frame- 
work by assuming that a primary re- 
inforcing stimulus has a relatively 
high cue strength which depends on 
drive conditions and the nature of the 
reinforcement. Primary _ reinforce- 
ment then becomes simply another 
stimulus in the ongoing chain.4 The 
concepts involved are represented 
diagrammatically in Fig. 1. 

* Much of the structure utilized here should 
be credited to Estes (1950). However, our 


use of the “uncommitted” function f(p,) 
reflects a difference in the general approach. 
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We will now analyze the operation 
of secondary reinforcement in a par- 
ticular experiment with the objective 
of showing restrictions on the function 
f(p;) implied by the experimental 
results. 


THE OBSERVING RESPONSE 
EXPERIMENT 


The relevant features of observing 
response experiments are illustrated 
very nicely by Prokasy’s (1956) 
T-maze experiment. Rats were rein- 
forced half of the time at random in 
either goal box after 30 sec. detention 
inadelay chamber. The delay cham- 
ber on the left was white on reinforced 
trials and black on _ nonreinforced 
trials. (Opposite colors and _ sides 
were used in counterbalancing groups. ) 
Forced trials insured equal experience 
with both sides. Chambers on the 
right and left were made distinct by 
floors of different textures. A left 
turn represented an observing re- 
sponse as defined elsewhere (Prokasy, 
1956; Wyckoff, 1952). A clear pref- 
erence for the observing response was 
obtained asymptotically. 

Since this experiment is a special 
case of a delayed reinforcement ex- 
periment, the Spence-Hull (Spence, 
1947; Hull, 1943) analysis, which 
seems to be generally accepted, is 
directly applicable and implies that 
the learning of the response at the 
choice point is entirely dependent on 
immediate secondary reinforcement 
mediated by the delay stimuli. In 
the present case, the picture is compli- 
cated by the fact that there are three 
distinct kinds of delay stimuli. These 
can be called positive (white on the 
left in our example), negative (black 
on the left), and neutral (either color 
on the right). We will now see that 
two sources of secondary reinforce- 
ment are pitted against each other. 

At one hand, a left turn leads to a 


positive or negative delay chamber at 
random. The _ positive chamber 
would acquire secondary reinforcing 
value, since leaving this chamber al- 
ways leads to food. On the other 
hand, a right turn will always lead to 
a neutral delay chamber. This cham- 
ber would presumably also enjoy some 
secondary reinforcing value, since 
leaving it leads to food on half of the 
trials. Thus we have a relatively 
strong reinforcement on half of the 
trials for left turns pitted against an 
intermediate reinforcing effect for all 
right turns. In view of the experi- 
mental findings, an acceptable second- 
ary reinforcement theory must resolve 
this conflict in favor of left turns. 
Although the present analysis is made 
with reference to this particular study, 
other observing response experiments 
using procedures which are radically 
different in details, such as species and 
type of reinforcement, have yielded 
(Kelleher, 1958; 


equivalent results 
Wyckoff: 1951, 1952), suggesting that 
the findings are not dependent on the 


particular parameters of this one 
experiment. 

Applying the present formulation, 
let us first consider the cue strength of 
the positive, negative, and neutral 
delay chambers which we will denote 
as Pa, Ps, and px, respectively. The 
cue strength of primary reinforcement 
and of an empty goal box are assumed 
to be fixed and will be denoted as p;’ 
and po’ respectively. Let p; = f(po’) 
and po = f(po’). It is implied that 
bi > po. Under these conditions it 
can be shown that as the number of 


trials increases: 


lim pa = pi (3) 


Nx 


lim po = po (4) 


N-2 


lim pa & .5(p1 + po) (5) 


N-~ 
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Actually p, will not reach a fixed 
value but will fluctuate above and 
below a mean value of .5(f1 + po) by 
an amount depending on 6. Use of 
this mean value as an approximation 
will simplify the development and will 
not alter the conclusions. 

We now consider the strength of the 
responses of observing and nonobserv- 
ing (left and right turn) which we des- 
ignate as py, and pre. As the number 
of trials increases we find that: 


lim px & .5(f(pa) + f(bo)) (6) 


Substituting limiting values from Equ- 
ations 3, 4, and 5, we obtain: 


lim pr = .5(f(p1) + f(b2)) (8) 
lim pr = f(.5(p1 + po)) (9) 


The experimental finding of a pref- 
erence for the observing responses im- 
plies asymptotic values of pr < px 
and hence: 


f(.5(pi + po)) 
< .5(f(p1) + f(po)) 


The values of po and p, in this in- 
equality represent the values which 
apply to the Prokasy experiment, al- 
though we have noted some grounds 
for expecting the same outcome over a 
wide range of values. 


(10) 


SPECIFICATION OF ACCEPTABLE 
FUNCTIONS 


This inequality (Equation 10) im- 
mediately places certain restrictions 
on acceptable functions. First, the 
function f(x) = x is ruled out along 
with all other straight lines, since the 
use of a linear function produces 
equality between the two sides of 
Equation 10, rather than the required 
inequality. (The function f(x) = x 
was utilized by the author in a previ- 
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ous model but is now seen to be inade- 
quate in the present framework 
[ Wyckoff, 1954].) We can also elim- 
inate functions which are uniformly 
negatively accelerated since this con- 
dition implies a greater mean slope 
in the interval po < x < .5(po + pi) 
than in the interval .5(p1 + po) <x 
< p;, hence f(.5(p1 + po)) > .5(f(p1) 
+ f(pe2)), contradicting Equation 10. 
This establishes that the function 
must show positive acceleration, at 
least in some region. Perhaps the 
simplest function satisfying our re- 
quirements is a uniformly positively 
accelerated curve such as f(x) = x. 

The class of acceptable functions 
can be illustrated graphically by refer- 
ring to Fig. 2. The coordinates of 
the points A and C represent the 
values of pi, po, f(p1), and f(po) ap- 
plying in the Proksay experiment. 
A reference line is drawn between A 
and C, and the midpoint of this line is 
designated B. Acceptable functions 
consist of continuous, nondecreasing 
curves running from A to C and pass- 
ing below the point B. Two freehand 
curves illustrate the possibilities of a 
uniformly accelerated curve and an 
inflected curve. 

We note that curves meeting this 
criterion must have a relatively steep 
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segment somewhere in the upper 
region. We may also note further 
restrictions which would be placed on 
this function if we wished to make cer- 
tain further assumptions. For ex- 
ample, it might be reasonable to as- 
sume that the outcome of the experi- 
ment could not be reversed by altering 
the amount of reinforcement or the 
reinforcing value of the empty goal 
box. We might further assume that 
the direction of the outcome would 
not be reversed if other probabilities 
of reinforcement were used, as long as 
the probability remained the same on 
both sides. These assumptions would 
imply a curve which shows an acceler- 
ation greater than or equal to zero 
throughout. 

We have demonstrated the exist- 
ence of functions meeting the imposed 
requirements and have specified some 
ot the properties of such functions. 
A more specific quantitative model 
could be derived from the present de- 
velopment by adopting arbitrary func- 
tions from the acceptable set. Quan- 
titative curve fitting could then be 
attempted. 

Further development along these 
lines holds some potential interest. 
However, the writer is inclined to be- 
lieve that it will be more efficient to 
postpone further quantitative com- 
mitments until data bearing more 
directly on the nature of the function 
can be obtained. 

At this point it will be of interest to 
see how some of the qualitative im- 
plications of the formulation can be 
related to other experiments. We 
now turn to a couple of brief examples. 


EXAMPLES OF IMPLICATIONS FOR 
OTHER EXPERIMENTS 


The first is a “‘chaining”’ experiment 
using pigeons in a Skinner box (Fer- 


ster & Skinner, 1957, p. 679). In this 
experiment an S could bring about the 
onset of a stimulus by pecking a key, 
and could obtain primary reinforce- 
ment by further pecking in the pres- 
ence of the stimulus. 

A variable interval schedule was 
used and measures of rate of respond- 
ing were obtained in both “links” of 
the chain; i.e., both before (first link) 
and after (second link) the onset of 
the stimulus. Responding in the first 
and second links of the chain can be 
attributed to secondary reinforcing 
and cue properties of the stimulus 
respectively. 

After extensive training, satiation 
curves were obtained in a continuous 
session. According to the present 
formulation a decline in the cue 
strength of a stimulus should be ac- 
companied by an even more rapid 
decline in its secondary reinforcing 
value. Hence, as motivation de- 
creased in this experiment, we would 
expect the rate of responding in the 
first link to decline more rapidly than 
that in the second link. The results 
conform to this expectation in a very 
striking way. Data were available 
for a six hour interval for each of two 
Ss. In th» first two hours the rate in 
the first ‘ik was roughly 50% of that 
in the second. In the last two hours 
the rate is the second link averaged 
73% of its initial value, whereas the 
rate in the first link had dropped to an 
average of 15% of its initial value. 
Similar results have been obtained in 
some exploratory chaining experi- 
ments with rats in our laboratory. 

A somewhat different kind of ex- 
periment, to which the present formu- 
lation can be applied, is reported by 
Leventhal (1955). In this experi- 
ment rats were run in a T maze with 
a 30 sec. delay in either arm. The 
left goal box was baited with one food 
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Actually p, will not reach a fixed 
value but will fluctuate above and 
below a mean value of .5(f1 + po) by 
an amount depending on @. Use of 
this mean value as an approximation 
will simplify the development and will 
not alter the conclusions. 

We now consider the strength of the 
responses of observing and nonobserv- 
ing (left and right turn) which we des- 
ignate as py, and pre. As the number 
of trials increases we find that: 


lim px = .5(f(pa) + f(bo)) (6) 
lim pe & f(a) (7) 


Substituting limiting values from Equ- 
ations 3, 4, and 5, we obtain: 


lim px = .5(f(p1) + f(b2)) (8) 
lim pe & f(.5(p1 + po)) (9) 


The experimental finding of a pref- 
erence for the observing responses im- 
plies asymptotic values of pr < pr 
and hence: 


f(.5(p1 + po)) 
< .5(f(p:) + f(po)) 


The values of {0 and ; in this in- 
equality represent the values which 
apply to the Prokasy experiment, al- 
though we have noted some grounds 
for expecting the same outcome over a 
wide range of values. 


(10) 


SPECIFICATION OF ACCEPTABLE 
FUNCTIONS 


This inequality (Equation 10) im- 
mediately places certain restrictions 
on acceptable functions. First, the 
function f(x) = x is ruled out along 
with all other straight lines, since the 
use of a linear function produces 
equality between the two sides of 
Equation 10, rather than the required 
inequality. (The function f(x) = x 


was utilized by the author in a previ- 


ous model but is now seen to be inade- 
quate in the present framework 
[ Wyckoff, 1954].) We can also elim- 
inate functions which are uniformly 
negatively accelerated since this con- 
dition implies a greater mean slope 
in the interval po < x < .5(po + p1) 
than in the interval .5(p; + po) <x 
< pi, hence f(.5(p1 + po)) > .5(f(b1) 
+ f(p2)), contradicting Equation 10. 
This establishes that the function 
must show positive acceleration, at 
least in some region. Perhaps the 
simplest function satisfying our re- 
quirements is a uniformly positively 
accelerated curve such as f(x) = x. 

The class of acceptable functions 
can be illustrated graphically by refer- 
ring to Fig. 2. The coordinates of 
the points A and C represent the 
values of pi, po, f(p1), and f(po) ap- 
plying in the Proksay experiment. 
A reference line is drawn between A 
and C, and the midpoint of this line is 
designated B. Acceptable functions 
consist of continuous, nondecreasing 
curves running from A to C and pass- 
ing below the point B. Two freehand 
curves illustrate the possibilities of a 
uniformly accelerated curve and an 
inflected curve. 

We note that curves meeting this 
criterion must have a relatively steep 
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segment somewhere in the upper 
region. We may also note further 
restrictions which would be placed on 
this function if we wished to make cer- 
tain further assumptions. For ex- 
ample, it might be reasonable to as- 
sume that the outcome of the experi- 
ment could not be reversed by altering 
the amount of reinforcement or the 
reinforcing value of the empty goal 
box. We might further assume that 
the direction of the outcome would 
not be reversed if other probabilities 
of reinforcement were used, as long as 
the probability remained the same on 
both sides. These assumptions would 
imply a curve which shows an acceler- 
ation greater than or equal to zero 
throughout. 

We have demonstrated the exist- 
ence of functions meeting the imposed 
requirements and have specified some 
ot the properties of such functions. 
A more specific quantitative model 
could be derived from the present de- 
velopment by adopting arbitrary func- 
tions from the acceptable set. Quan- 
titative curve fitting could then be 
attempted. 

Further development along these 
lines holds some potential interest. 
However, the writer is inclined to be- 
lieve that it will be more efficient to 
postpone further quantitative com- 
mitments until data bearing more 
directly on the nature of the function 
can be obtained. 

At this point it will be of interest to 
see how some of the qualitative im- 
plications of the formulation can be 
related to other experiments. We 
now turn to a couple of brief examples. 


EXAMPLES OF IMPLICATIONS FOR 
OTHER EXPERIMENTS 


The first is a “‘chaining’’ experiment 
using pigeons in a Skinner box (Fer- 


ster & Skinner, 1957, p. 679). In this 
experiment an S could bring about the 
onset of a stimulus by pecking a key, 
and could obtain primary reinforce- 
ment by further pecking in the pres- 
ence of the stimulus. 

A variable interval schedule was 
used and measures of rate of respond- 
ing were obtained in both “links” of 
the chain; i.e., both before (first link) 
and after (second link) the onset of 
the stimulus. Responding in the first 
and second links of the chain can be 
attributed to secondary reinforcing 
and cue properties of the stimulus 
respectively. 

After extensive training, satiation 
curves were obtained in a continuous 
session. According to the present 
formulation a decline in the cue 
strength of a stimulus should bé ac- 
companied by an even more rapid 
decline in its secondary reinforcing 
value. Hence, as motivation de- 
creased in this experiment, we would 
expect the rate of responding in the 
first link to decline more rapidly than 
that in the second link. The results 
conform to this expectation in a very 
striking way. Data were available 
for a six hour interval for each of two 
Ss. In the first two hours the rate in 
the first link was roughly 50% of that 
in the second. In the last two hours 
the rate in the second. link averaged 
73% of its initial value, whereas the 
rate in the first link had dropped to an 
average of 15% of its initial value. 
Similar results have been obtained in 
some exploratory chaining experi- 
ments with rats in our laboratory. 

A somewhat different kind of ex- 
periment, to which the present formu- 
lation can be applied, is reported by 
Leventhal (1955). In this experi- 
ment rats were run in a T maze with 
a 30 sec. delay in either arm. The 
left goal box was baited with one food 
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pellet on every trial, while the right 
goal box was baited with two pellets 
on half of the trials at random. (Op- 
posite goal boxes were used with 
counterbalancing groups.) A_ clear 
preference for the two pellet side was 
obtained after considerable training. 

It will be convenient here to use the 
function f(x) = x* as a simple repre- 
sentative of our set of acceptable sec- 
ondary reinforcement functions. It 
should be made clear that no commit- 
ment to this particular function is 
intended. The general line of argu- 
ment would apply equally well with 
other members of the set. If we as- 
sume for the moment that the cue 
strength of one food pellet is p and 
two pellets is 2, we find that the cue 
strength of the left delay chamber will 
approach p? and that of the right 
chamber will approach a mean value 
of .5(2p)? or 2p?. The response of 
turning left would then approach 
(p?)? or p*, while the response of 
turning right would approach (2p*)* 
or 4p*. Thus, at asymptote, the 
strength of right turns would exceed 
that of left turns by a factor of 4. 

Actually it would be hard to justify 
the assumption that the cue strength 
of two pellets is twice as great as that 
of one, but the prediction would be in 
the same direction even if this ratio 
was considerably smaller. 

These two examples were selected 
because they represent cases where 
current conditioning theory would not 


seem to provide any prediction one 
In the case of the 


way or the other. 
chaining experiment we could appeal 
to the ‘‘goal gradient’’ notion, but 
would not find enough detail to allow 
a prediction. Furthermore, the Hull- 
Spence analysis of delayed reinforce- 
ments seems to bring us right back 
to the problem of secondary reinforce- 
ment. 


In the case of Leventhal’s experi- 
ment we are clearly concerned with 
effects of frequency and amount of 
reinforcement, but again the relevant 
previous postulates do not provide 
enough detail for prediction. Perkins’ 
(1956) ingenious formulation, regard- 
ing the effects of reinforcement and 
appropriate preparatory responses, 
might be brought to bear, but un- 
fortunately it would seem to lead to 
the opposite prediction. Optimal pre- 
paratory responses could occur only 
on the one pellet side and thus, ac- 
cording to this formulation, reinforc- 
ing effects on this side should be 
enchanced. 

These examples provide further in- 
dications of a gap in current condi- 
tioning theory which may be filled by 
development along the lines of the 
present formulation. They also re- 
emphasize the fact that a theory of 
secondary reinforcement will probably 
have implications for a diverse array 
of learning experiments, and that any 
progress here holds promise of clarify- 
ing and increasing the acuity of our 
interpretation of experiments in this 
broad class. 


COMMENT ON DIFFICULTIES IN 
PREVIOUS EXPERIMENTS 


One final matter deserves some com- 
ment. In Part | of this paper we ob- 
served a kind of capriciousness of 
secondary reinforcing effects in ex- 
periments aimed at obtaining direct 
measures. One implication of the 
present formulation gives a suggestion 
of a possible source of these difficulties. 
Referring to the examples of accept- 
able secondary reinforcement func- 
tions in Fig. 2, we see that it is possi- 
ble for a moderately strong cue to have 
very slight secondary reinforcing 
value, a very strong cue being re- 
quired to obtain an appreciable effect. 
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This moderately strong primary rein- 
forcement, which would be quite ade- 
quate in other learning experiments, 
could readily fail to establish an ap- 
preciable secondary reinforcing effect. 
Relatively small primary reinforce- 
ments have been used in many sec- 
ondary reinforcement experiments. 


Perhaps more important is the fact 


that a rapid drop in cue strength is 
typically seen immediately following 
the initiation of extinction. Thus, 
when secondary reinforcement tests 
are made during extinction, it is quite 
plausible to assume that cue strength 
would drop almost immediately to a 
point where secondary reinforcement 
would be ineffective. Note that in- 
termittent reinforcement training has 
been found to be favorable to second- 
ary reinforcement by Saltzman (1949) 
and by Zimmerman (1957). Here 
the sudden drop in cue strength would 
be attenuated. These considerations 
may help to explain some of the diffi- 
culties that have been encountered. 
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There has been much _ controversy 
(Burke: 1953, 1954; Hick, 1953; Jones: 
1952, 1954; Marks: 1951, 1953) regard- 
ing the use of the one-tailed test of sig- 
nificance. The important question de- 
bated is not if it should be used, but 
rather when it should be used. Kimmel 
(1957) has recently attempted to resolve 
the controversy by suggesting criteria 
for the use of one-tailed tests. He main- 
tains that one-tailed tests may be used 
when results in the opposite direction: 
(a) will not be used to determine any 
new course of behavior, (b) will be psy- 
chologically meaningless, or (c) cannot 
be deduced by any psychological theory, 
while an outcome in the expected direc- 
tion can. It is these last two instances 


that will be dealt with in this paper. 
When an experimenter uses a one- 


tailed test and finds that his results are 
in disagreement with his prediction (i.e., 
if they are in the opposite direction and 
would have been statistically significant 
had a two-tailed test been used), he can 
do one of several things. One course of 
action is simply to ignore these findings. 
In practical problems (e.g., deciding 
whether or not to introduce new ma- 
chinery for production),. this approach, 
which is consistent with Kimmel’s first 
criterion, is quite acceptable. With re- 
gard to this practice in psychology, on 
the other hand, Burke has quite correctly 
pointed out that “It is to be doubted 
whether experimental psychology, in its 
present state, can afford such lofty indif- 
ference toward experimental surprises” 
(Burke, 1953, pp. 385-386). Because an 
outcome is not deducible from any e-xist- 
ing theory does not mean that it could 
not be deduced from future theories. 
The implicit assumption in this practice 
is that no new theoretical approaches will 


1The writer is greatly indebted to K. H. 
Kurtz for his critical evaluation of this 


paper. 


be advanced in the future, and that the 
task of psychology as a science is to con- 
firm the presently existing theories. A 
similar criticism might be made of Kim- 
mel’s criterion of unpredicted differ- 
ences being “psychologically meaning- 
less.” He defines the “possible meaning” 
of a difference in the unpredicted direc- 
tion “. .. in terms of previous data arid 
present conditions” (Kimmel, 1957, p. 
352). Whether or not it is “possible” 
for a given proposition to have meaning, 
however, depends upon whether or not 
it is capable of confirmation (Carnap, 
1953). It seems that what Kimmel is 
referring to when he speaks of the “pos- 
sible meaning” of a given outcome is 
actually the degree to which a proposi- 
tion regarding this outcome (i.e., one 
that states that such an event does not 
fit into an existing psychological theory) 
has been confirmed. Thus, since “psy- 
chological meaningfulness” in this sense 
will change as our knowledge increases, 
criticisms made of the criterion of theo- 
retical predictability apply here as well. 

The experimenter might, on the other 
hand, wish to take cognizance of his un- 
expected findings. To do so, however, 
he must adopt the procedure of changing 
his original null hypothesis. Instead of 
testing the hypothesis that pw; < pe (one- 
tailed), he might test the hypothesis that 
1 = pe (two-tailed).* However, if this 
practice is adopted, there will be an 
increase in the probability of making a 
Type I error. This results from the 
combination of the probality of com- 
mitting a Type I error when using the 
original null hypothesis with the prob- 
ability associated with the new null hypo- 
thesis. For example, suppose one has 
adopted the .05 level of significance when 


2Some experimenters might make the 
even greater change in their null hypothesis 
by testing wi2 we (one-tailed, but in the 
opposite direction). 
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using a one-tailed test; the probability 
of making a Type I error is thus .05 
(the probability associated with the criti- 
cal region in the given tail). If the null 
hypothesis is changed to account for the 
difference which is statistically signifi- 
cant in the opposite direction, what is 
actually being used now is a two-tailed 
test. Was this to be used originally (i.e., 
before the study was conducted), the 
probability of the experimenter commit- 
ting a Type I error would be .05 (.025 
being associated with the critical point 
in each direction). Since this null hypo- 
thesis has been adopted after the decision 
had been made to use a one-tailed test 
(with its associated probability of mak- 
ing this error), the probability of this 
researcher committing a Type I error 
is now .075 (ie. .05 when H,:p; < pe 
plus .025 related to the “unexpected” 
direction when H,:y,=p,). It should 
also be noted that even an experimenter 
who believes he has the option to use 
such a procedure is operating under the 
075 level of significance—whether or 
not he has occasion to test a difference 
in the “unexpected” direction. Thus, 
whether he knows it or not, such an 
investigator is using a two-tailed test 
with one tail twice as large as the other. 

Another possible course of action 
would be to repeat the experiment, now 
using a two-tailed test. Assuming that 
the errors of measurement are not sig- 
nificantly greater in this replication and 
this experimenter obtains the same re- 
sults, he may now conclude that his 
findings in the previously unexpected 
direction are significant. Thus, the de- 
cision to use a one-tailed test may result 
in the necessity of repeating the study if 
results appear in the opposite direction. 

The above considerations indicate that 
the criteria of theoretical predictability 
and psychological meaninglessness are 
not as decisive as they may appear to be. 
Three possible courses of action available 
when results occur in the unpredicted 
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direction each present difficulties? Jg- 
noring differences in the “unexpected” 
direction leads to the omission of findings 
which may have important theoretical 
significance, and thus stifles any fresh 
theoretical thinking that might have 
otherwise emerged. If the differences 
are recognized by switching to a two- 
tailed test within a given study, psychol- 
ogy as a science may unwittingly be led 
to operate under conclusions with a lower 
level of statistical significance. The third 
approach, repeating the experiment and 
applying a two-tailed test to.this new set 
of data, might be undesirable in ternis 
of the time, expense, etc. that would be 
involved. The decision to use a one- 
tailed test should thus be made in light of 
the difficulties with which the investigator 
is confronted when the results occur in 
the “unexpected” direction. 
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3 Whether or not there exist any other 
possible approaches requires further analysis. 
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